The OMP project showed that initial professional scepticism can be overcome if Linked Data can be simply defined and the benefits clearly set out. Archivists will use Linked Data if a service or services are provided that automate of simplify mark-up or the semantic process more generally and embed it within existing cataloguing workstreams. Ideally, these can be built out of trusted aggregations, authorities or cataloguing systems such as the Hub, AIM25, TNA, CALM or ATOM. They are less likely to use Linked Data if it is perceived to be a complex, though potentially useful, add-on requiring detailed specialist knowledge and delivered without support or guidance ('built it and they will come'). The ability to retroconvert legacy catalogues and CLDs with Linked Data through automation against OpenCalais and other engines will help sell Linked Data more effectively, as can validation of metadata created out of mass digitisations and OCR.
The OMP project has underlined the value of Linked Data in a number of ways:
- Increased access and discovery
- Increased use and return on investment in cataloguing (speeding up cataloguing, enabling tools that require an archivist to locate and link information - for example indexing, finding already-existing authority records and linking to them; finding suitable subject terms; locating places from geonames or similar)
- Enhanced ability to justify expenditure on services and resource development (improved web-hits and connecting with heavily used services)
- Exposure of information to novel and different uses (Combining ALM collections for the delivery of services, including commercial services - apps, exhibitions, mapping, new tools etc)
Updated workflow interface including:
- Reduction of the requirement for archivists to input HTML
- Reduction of the on-screen size of the form
- Integration of the process of selecting access points
- Automatic semantic annotation to aid selection of classifying terms
- Authority lookup (internal and external - UKAT, GeoNames, etc) to improve rigour of metadata
- SKOS representation of AIM25-UKAT data
- RDF for AIM25 people, families and corporate names
- GeoNames representation of AIM25 place data
- Semantic lookup allowing users to further explore definitions and instances of terms based on the properties defined during the workflow process.
The main business case is two-fold: adding value and boosting efficiency. Archivists are very attracted to the idea of enabling UKAT in Linked Data but as an active service like OpenCalais, not a look-up. AIM25 has developed a SKOS version of UKAT and a workflow tool that would link from a revised AIM25 data entry template to a LD UKAT.
Of place, personal name, corporate name and subject, subject terms are arguably the most subjective, requiring the archivist to exercise judgment on the preferred term with the collection and potential users in mind. OMP has shown that subject terms throw up the least accurate semantic returns from a linguistic analysis service such as OpenCalais (places can often be matched with absolute precision, as can personal names). OMP has improved professional efficiency by developing a hover tool to enable the archivist to select a preferred subject term from UKAT or via connecting to LD versions of LCSH/NRA and to add this term or terms to their new catalogue/CLD.
Without such automation, Linked Data won't be embedded or the data linked will be limited in scope. Flexibility is key. Focus group archivists concluded that they need the ability to analyse as much or as little of a description as they need, and to reach that faceting decision as speedily as possible - selecting the most important entities that require linking in any body of text, and fields (just 'creator', 'institution' etc or terms within Scope and Content or Admin/Biographical?). The value of broader authority data was reiterated by the archivists - analysis should not be limited to Scope and Content. A fundamental point is that back-end Linked Data enhancement works best when it works with the grain of professional practice - pragmatically and speedily.
The OMP approach is innovative in that it offers further exposure of data - and all AIM25 data has been processed as part of the project. Sustainability will be maintained going forward either by periodic manual data dumps into OpenCalais or by automated calendared refreshes - the same approach could be envisaged for LD UKAT as a national service plugged into local systems such as CALM. Improving the OpenCalais vocabularly by importing archive-specific terms is crucial to the success of mark-up. Analysis of the catalogue data is only valuable if OpenCalais learns from archivists. Until this happens, the breadth of vocabulary will limit the scope of the mark-up. It is also worth putting pressure on the main suppliers of archival cataloguing software to encourage them to embed support in periodic upgrades.
Experimentation with NRA data is ongoing - this will test how difficult it would be to build an authorities service off the NRA/ARCHON. The results will be described in a separate blog post.