Rory and I met with Richard Gartner and Gareth Knight at CeRch today, to catch up with their investigations into using GATE and OpenCalais to process the EAD outputs from AIM25.
Results look very encouraging. OpenCalais, in particular, generates a post-processing set of identified entities (personal names, place names, corporate names) which Richard G has then created regular expressions to locate these in the body of the EAD and wrap in appropriate EAD tags (<persname> etc).
This suggests that the way forward for enhancing the existing data entry processes for AIM25 will involve dispatching the EAD-compliant data entered by collections manager to OpenCalais, and returning the data, with enhanced markup, for checking by the submitter. This hook should be easy enough to insert for manual, form-based entry; for batch entry processes we will need to assess whether any significant delays are introduced.
We've also started to consider ideas for a URI scheme for the entities identified. Our current working hypothesis is that this will involve defining a "data" namespace for AIM25, binding to http://data.aim25.ac.uk/. Within that we can develop a structure along the lines /person, /place, /corporate_body, and append our unique IDs for each entity. Further research is necessary, particularly into the recommendations of the Cabinet Office recommendations for Designing URI Sets for the UK Public Sector.
These URIs can then be used in identifier attributes for our EAD elements (<persname>, etc.), and thence easily transformed into an RDFa format for the Web-based HTML rendering of the AIM25 catalogues.
Next steps include further investigating how to implement and assert relationships between our entities and other open datasets (e.g. our_entity is_the_same_as your_entity). And how to make the authority data, duly marked-up, available as open metadata.
Rory and I can now start to consider suitable approaches to embedding this in our development copy of the existing AIM25 system, and we'll continue to liaise closely with CeRch for advice on the relative merits of Gate and OpenCalais processing, and guidance on URI implementation.