TEI Lite

TEI (Text Encoding Initiative)

TEI has produced an XML format for exchanging texts.

“TEI Lite is a specific customization of the TEI tagset, designed to meet ‘90% of the needs of 90% of the TEI user community’. Due to its simplicity and the fact that it can be learned with relative ease, TEI Lite has been widely adopted, particularly by beginners and by big institutional projects that rely on large teams of encoders to markup their documents.”

The TEI consortium includes many prestigious members such as Oxford University, and several associated units such as their digital library, and Yale University.

Description

TEI is a basic set of tags focused on document structure. It is free format though a hierarchy can be implemented through the use of <div> tags.

TEI Lite supports many features we would find beneficial including:

  • an <expan> tag to record the expansion of an abbreviation entered by the encoder. So, A. viridens could become <expan>Attelabus</expan> viridens
  • numerous date and time formats
  • bibliographic citations
  • semantic enhancement through the @type attribute
  • simple support for images and diagrams, including the ability to embed digitized versions of the graphic
  • cross references as used extensively in taXMLit.

In addition, it is extendible so that TEI can reference other XML tagsets, such as MathML, from within a TEI document as well as embed TEI text within other types of XML documents, such as METS and MODS records.

The project website has more detail.

Examples

Examples cited in the supplied documentation include Jane Eyre, though TEI Lite can cope with technical documents and has specific tags for mathematical formula, sample program code, etc.

Projects using TEI range don’t quite range from A to Z, but nearly make it starting from African American Women Writers of the 19th Century and ending at Wright American Fiction 1851-1875.

Tools

There are a variety of tools mentioned in the TEI wiki, including many generic XML editors such as Oxygen, and Notepad++. Such is the widespread use of TEI its schemas are included in several XML editors including Editix and XML Copy Editor. In addition, XSLTs are available to convert TEI XML to LaTeX and other formats including an OpenOffice plug-in to import and export TEI.

Summary

TEI is a well established and widely used XML schema. It is used as a staging post in the encoding of documents in taXMLit, so has proven suitable for use in the biodiversity domain without the need for customisation.

 TEI and taXMLit

The taXMLit / INOTAXA project uses TEI-LITE as a first step towards a full markup in taXMLit.   In order to undertake the initial conversion of a document to TEI-LITE rules have to be developed for the taxonomy-specific components of the text; these rules can be modified to the differenet accommodate editorial style of the source publications.

Comments

GENIA and TEI

The GENIA corpus, which we often refer to as an exemplar corpus for what we would like to achieve for biodiversity, has support for converting GPML to TEI markup.

Scratchpads developed and conceived by: Vince Smith, Simon Rycroft, Dave Roberts, Ben Scott...