TEI has produced an XML format for exchanging texts.
“TEI Lite is a specific customization of the TEI tagset, designed to meet ‘90% of the needs of 90% of the TEI user community’. Due to its simplicity and the fact that it can be learned with relative ease, TEI Lite has been widely adopted, particularly by beginners and by big institutional projects that rely on large teams of encoders to markup their documents.”
The TEI consortium includes many prestigious members such as Oxford University, and several associated units such as their digital library, and Yale University.
TEI is a basic set of tags focused on document structure. It is free format though a hierarchy can be implemented through the use of <div> tags.
TEI Lite supports many features we would find beneficial including:
In addition, it is extendible so that TEI can reference other XML tagsets, such as MathML, from within a TEI document as well as embed TEI text within other types of XML documents, such as METS and MODS records.
The project website has more detail.
Examples cited in the supplied documentation include Jane Eyre, though TEI Lite can cope with technical documents and has specific tags for mathematical formula, sample program code, etc.
Projects using TEI range don’t quite range from A to Z, but nearly make it starting from African American Women Writers of the 19th Century and ending at Wright American Fiction 1851-1875.
There are a variety of tools mentioned in the TEI wiki, including many generic XML editors such as Oxygen, and Notepad++. Such is the widespread use of TEI its schemas are included in several XML editors including Editix and XML Copy Editor. In addition, XSLTs are available to convert TEI XML to LaTeX and other formats including an OpenOffice plug-in to import and export TEI.
TEI is a well established and widely used XML schema. It is used as a staging post in the encoding of documents in taXMLit, so has proven suitable for use in the biodiversity domain without the need for customisation.
The taXMLit / INOTAXA project uses TEI-LITE as a first step towards a full markup in taXMLit. In order to undertake the initial conversion of a document to TEI-LITE rules have to be developed for the taxonomy-specific components of the text; these rules can be modified to the differenet accommodate editorial style of the source publications.
Comments
GENIA and TEI
The GENIA corpus, which we often refer to as an exemplar corpus for what we would like to achieve for biodiversity, has support for converting GPML to TEI markup.