Applying named entities

Background

OpenCalais is a set of tools developed by and provided by Thomson Reuters to create semantic metadata for content.

The service is primarily intended to enable semantic enhancement of general internet text, such as blogs, rather than scientific works. This has influenced the entity types that the service can recognise.

If OpenCalais can be said to have a specialist knowledge domain it is that of general news or current affairs. While identifying an EntertainmentAwardEvent might not interest us, identifying a Country does. Therefore, OpenCalais might offer us a means to add semantic metadata to our texts.

More detail

The attached OpenCalais_notes.odt details our experience and learning points from it.

Attachments

All files mentioned in OpenCalais_notes.odt are attached to this post.

Sample output

Sample output is attached to this post: *_calais*.txt for a list of found entities, *_tei_annotated_oc.xml for marked up TEI XML.

AttachmentSize
OpenCalais_notes.odt24.45 KB
run_opencalais_v2.php_.txt7.08 KB
run_opencalais_v3.php_.txt9.3 KB
oc_output_formats.txt14.28 KB
bulletinofbritis27zoollond_opencalais2.txt4.9 KB
bulletinofbritis27zoollond_opencalais3.txt1004 bytes
bulletinofbritis27zoollond_tei_annotated_oc.xml_.txt1.43 MB
bulletinofbritis51entolond_opencalais3.txt773 bytes
bulletinofbritis51entolond_tei_annotated_oc.xml_.txt2 MB
bulletinofbritis51entolond_opencalais2.txt2.42 KB
Scratchpads developed and conceived by: Vince Smith, Simon Rycroft, Dave Roberts, Ben Scott...