OpenCalais is a set of tools developed by and provided by Thomson Reuters to create semantic metadata for content.
The service is primarily intended to enable semantic enhancement of general internet text, such as blogs, rather than scientific works. This has influenced the entity types that the service can recognise.
If OpenCalais can be said to have a specialist knowledge domain it is that of general news or current affairs. While identifying an EntertainmentAwardEvent might not interest us, identifying a Country does. Therefore, OpenCalais might offer us a means to add semantic metadata to our texts.
The attached OpenCalais_notes.odt details our experience and learning points from it.
All files mentioned in OpenCalais_notes.odt are attached to this post.
Sample output is attached to this post: *_calais*.txt for a list of found entities, *_tei_annotated_oc.xml for marked up TEI XML.
Attachment | Size |
---|---|
OpenCalais_notes.odt | 24.45 KB |
run_opencalais_v2.php_.txt | 7.08 KB |
run_opencalais_v3.php_.txt | 9.3 KB |
oc_output_formats.txt | 14.28 KB |
bulletinofbritis27zoollond_opencalais2.txt | 4.9 KB |
bulletinofbritis27zoollond_opencalais3.txt | 1004 bytes |
bulletinofbritis27zoollond_tei_annotated_oc.xml_.txt | 1.43 MB |
bulletinofbritis51entolond_opencalais3.txt | 773 bytes |
bulletinofbritis51entolond_tei_annotated_oc.xml_.txt | 2 MB |
bulletinofbritis51entolond_opencalais2.txt | 2.42 KB |