XSLT to extract taxon names from taXMLit file

The attached XSLTs enable you to recover taxon names from a taXMLit marked up file.

In taXMLit several elements contain taxonomic material as explained in the taXMLit documentation page. These XSLTs work with the TaxonName element.

The XSLTs are:

  • extractTaxonName - simply retrieves the content of all TaxonName nodes
  • extractTaxonNameSortUnique - retrieves the content of all TaxonName nodes, and presents the results in alphabetical order with duplicate names removed
  • extractTaxonNamePartOne - retrieves the explicitly cited taxon names into an XML file
  • extractTaxonNamePartTwo - uses the XML file from PartOne, removes duplicates and sorts the taxon names to produce a text file.

The output from the XSLTs are attached as Result_ files.

Note:

  • the XSLTs removes superfluous whitespace and indents present in the XML source. This can be commented out and will not alter the retrieval of actual text content
  • the XML and XSLT files have a .txt extension because they can not be attached to this post. To use the XSLTs you should alter their extension to .xslt.
AttachmentSize
extractTaxonNamePartOne.txt806 bytes
extractTaxonNamePartTwo.txt769 bytes
result_extractTaxonNamePartOne.txt84.42 KB
result_extractTaxonNamePartTwo.txt17.25 KB
extractTaxonName.txt676 bytes
result_extractTaxonName.txt64.11 KB
extractTaxonNameSortUnique.txt824 bytes
result_extractTaxonNameSortUnique.txt22.42 KB

Comments

And in PHP/DOM too

Should you wish to avoid the mysteries of XPath statements this PHP script will also extract the TaxonName nodes. It manages the XML file through the document object model, and so can be manipulated using the same techniques as though the source document was a web page. Note, this is a quick and dirty example for I have hard coded the input file Gold_BCA.xml within the script:


<?php
$dom = new DomDocument();
$dom -> load('Gold_BCA.xml');
echo('Taxon names in Gold_BCA.xml are: ');
$TaxonNames = $dom -> getElementsByTagName('TaxonName');
foreach($TaxonNames as $node) {
  echo($node -> textContent . "\n");
}
?>

Scratchpads developed and conceived by: Vince Smith, Simon Rycroft, Dave Roberts, Ben Scott...