Friday, 2 September 2011

The OMP prototype ISAD(G) editor - How it works

This post describes the prototype workflow tool. How it can be used for recording AIM25 collection level records and some associated technical issues.

General aims of prototype

  1. To improve on the usability of the existing offering
  2. To make use of semantic annotation within the tool
  3. To use linked data to enhance the user experience of the AIM25 website

Detailed aims

Aim 1.

  • a) Eliminate the need for archivists to use mark-up
  • b) Integrate the indexing process with the metadata recording process
  • c) Reduce page scrolling and generally improve usability

Aim 2.

  • a) Analyse the textual input against existing authoritative sources both external and internal
  • b) Suggest and record indexing terms derived from the analysis
  • c) Record the semantic properties of terms

Aim 3

  • a) Mark-up the semantic properties of indexed terms both within the ISADG display and within the “Access points” lists
  • b) Provide links to related services based on the semantic properties of the terms
Technological details

Aim1

For the prototype we took a snapshot of the AIM25 database and put this on the OMP test server (http://data.aim25.ac.uk).

An alternative workflow for adding and editing collection level records was built using in PHP using this snapshot as a data source. The Javascript framework jQuery was also used to control on-screen actions and provide a bit of dynamism. The ISAD(g) elements are grouped into the areas as described by documents like this http://is.gd/GEXdG2 each area given a tab.

becomes


The access points are displayed on the right. Terms are colour coded according to the four term types:
  • person
  • place
  • organisation
  • concept
Each term classified under these types are represented by the OMP data-browser which parses RDF that is in-turn derived from the existing database (more on that later). Archivists can remove access points by dragging and dropping terms in to the dustbin icon.

Aim2

The prototype workflow uses one external service (openCalais) directly to analyse text and suggest useful terms for indexing. AIM25's existing index is also interrogated, this dataset includes the UK archival thesaurus. As a result an AIM25 text analysis service was developed.

For rapid development this service runs boring old SQL on the existing AIM25 data tables, but as there is already a mechanism to transpose this data into RDF (and more) a more robust semantic solution is theoretically a short hop away.

Archivists can use check-boxes by the side of each textarea in the workflow form to select ISAD(G) elements for analysis. The selected text is sent for analysis by one or both of the services and results are displayed in two ways.
  • As embedded mark-up in the "textarea"
  • As term lists in the Analysis/Indexing area

Above is an example of a list of terms returned from the AIM25 service . The term "Weaving" is in the process of being added as an access point for this record.


Here we see the same results embedded in the text. These are a smaller set as they only include the exact matches. When saved, terms not added to the access points are stripped out. Those that remain can be represented in context as RDFa.


Here the results returned by OpenCalais are embedded and below they are displayed as a list so that they can be added to the access points. Also below are the results of a direct lookup on the AIM25 service so that archivists can add access points for terms that do not appear in the text.



Did we achieve any of aim3? More to follow on this soon...

No comments:

Post a Comment