Marked-up Journals

Interim Output

This page contains automatically generated TEI versions of our eleven target journals.

The output is automatically derived from available Dublin Core and text data. It has not been manually corrected or enhanced.

The next step is to enhance the XML by adding semantic data based on:

  1. automatically marking up the taxon names found using the FindIT algorithm
  2. automatically marking up other proper names, such as countries, found using the OpenCalais service
  3. using the word clustering techniques to apply FindIT and OpenCalais results to orthographic variants
  4. finally to manually review the output, especially the document metadata in the teiHeader section

These tasks have already been achieved with proof of concept scripts. The remaining work is to productionise the scripts so that they can be run repeatedly reliably.

Outstanding workflow tasks
  1. Talk further with Anton on how best to integrate his Latin language detection routines into the workflow.
  2. Revisit ABBYY XML for typographical data such as italic text.
AttachmentSize
bulletinofbritis27zoollond_tei.xml_.txt1.07 MB
bulletinofbritis28zoollond_tei.xml_.txt1.22 MB
bulletinofbritis35zoollond_tei.xml_.txt1.14 MB
bulletinofbritis36zoollond_tei.xml_.txt1.23 MB
bulletinofbritis44zoollond_tei.xml_.txt933.48 KB
bulletinofbritis49entolond_tei.xml_.txt1.47 MB
bulletinofbritis50entolond_tei.xml_.txt1.15 MB
bulletinofbritis51entolond_tei.xml_.txt1.28 MB
bulletinofbritis50zoollond_tei.xml_.txt919.9 KB
bulletinofbritis52entolond_tei.xml_.txt1.33 MB
bulletinofbritis53entolond_tei.xml_.txt1.04 MB
Scratchpads developed and conceived by: Vince Smith, Simon Rycroft, Dave Roberts, Ben Scott...