CONVERTING TAXONOMIC DESCRIPTIONS TO NEW DIGITAL FORMATS

TitleCONVERTING TAXONOMIC DESCRIPTIONS TO NEW DIGITAL FORMATS
Publication TypeJournal Article
Year of Publication2008
AuthorsCui, Hong
Journal TitleBiodiversity Informatics
Volume5
Pages20-40
Keywordsdigital formats, learning, morphological descriptions, semantic markup, supervised machine, system evaluation, taxonomic descriptions, unsupervised machine learning, XML
Abstract

The majority of taxonomic descriptions are currently in print format. The majority of digital descriptions are in a format, such as DOC, HTML, or PDF, for human readers. These formats do not convey rich semantics in taxonomic descriptions for computer aided processing. Newer digital formats, such as XML and RDF, accommodate semantic annotations that allow a computer to process the rich semantics on human's behalf, opening up opportunities for a wide range of innovative usages of taxonomic descriptions, including searching in more precise and flexible ways, integrating morphological, genomic, georeference, or other information, automatically generating taxonomic keys, and knowledge mining and visualizing taxonomic data etc. This paper reports our experience with the development of an automated semantic markup system named MARTT and discusses challenging issues involved. To address these challenging issues, a number of utilities were implemented to make MARTT a more operable system. The utilities can be used to speed up the preparation of training examples for MARTT, to facilitate the creation of more comprehensive annotation schemas, and to predict system performance on a new collection of descriptions. MARTT has been tested on several plant and alga taxonomic publications including Flora of China, Flora of North America, and Flora of North Central Texas.

Scratchpads developed and conceived by: Vince Smith, Simon Rycroft, Dave Roberts, Ben Scott...