TEI marked up deliverables

Purpose

Among the deliverables of the ABLE project are ten marked up biodiversity texts.

TEI is our chosen mark up language because it is a widely used and supported format. It was also the format used as input by Data Conversion Laboratory to generate the first pass of the BCA in taXMLit format for the INOTAXA project.

This page records our work in producing TEI marked up documents from source material made available in the BHL by the NHM.

Creating TEI XML

TEI documents contain two sections: metadata in a <teiHeader> section, and the body of the document itself in a <text> section.

The <teiHeader> section is completed by an XSL transformer (dc2teiHeader.xsl) that converts Dublin Core XML into the appropriate TEI elements. (See attached file DCtoTEI_notes.odt for a table of elements mapped between the two schemas.) The transformer produces an XSL, teiHeader.xsl, used in the next step that populates the<text> section.

The <text> section is completed by an XSL transformer (djvu2tei.xsl) that converts DjVu XML files into the appropriate TEI elements and includes the metadata in the teiHeader.xsl produced by the previous step. (See attached file DjVutoTEI_notes.odt for a table of elements mapped between the two schemas.)

A couple of sample files marked up to TEI are attached to this page. They are all taken from the Bulletin of the Natural History Museum: Entomology series.

Future work

Work superseded by use of ABBYY XML as main input for text. This change was made because ABBYY XML contains typographic cues lacking in DjVu. The DC to teiHeader XSL remains in use.

Attached XML and XSL files are suffixed _.txt to enable their attachment to this page.

At right is a thumbnail (click to see full size image) for the workflow taking data from DublinCore and DjVU to produce a basic TEI file, and the semantic enhancement that would be applied to produce the next step on the way to full taXMLit mark up.

AttachmentSize
dc2teiHeader.xsl_.txt4.74 KB
djvu2tei.xsl_.txt1.13 KB
bulletinofbritis49entolond_tei.xml_.txt1.47 MB
bulletinofbritis50entolond_tei.xml_.txt1.15 MB
DCtoTEI_notes.odt16.5 KB
DjVutoTEI_notes.odt11.57 KB
Scratchpads developed and conceived by: Vince Smith, Simon Rycroft, Dave Roberts, Ben Scott...