JISC Step change will create Linked Data architecture for the UK archive sector, completing in July 2012. It draws on the lessons of Open Metadata Pathway and brings together King's College London Archives, ULCC, Axiell, Cumbria Archive Service, Historypin and the charity, 'We are what we do'. The project will use data held by AIM25 and focuses on delivering a new UKAT Web service and toolset that will allow archivists to mark up catalogues with triples and other semantic entities.
This post will examine the steps that other practitioners might need to take to exploit the potential of Linked Data, based on the experiences of the OMP project team.
Developer liaison: The focus group stage of the project was particularly valuable for bringing technical support together with busy archivists in a workshop setting to understand how semantic markup might be incorporated into archival workflow and best practice. The project has highlighted once again that successful development depends on a high level understanding of archival principles by technical developers facilitated through this kind of hands-on information exchange. Advice: Developers must have an appreciation of how archive catalogues are compiled by archivists and used by a variety of audiences to successfully embed Linked Data in normal business activity.
Front-end development: careful thought needs to be given to what adaptations need to be made to archival websites to express Linked Data entities and the connections they make with external data sources to get full value out of semantic markup. Advice: Institutional IT and web support need to be made aware of the value of Linked Data and the challenges in potential redesign of websites to express these new relationships.
Data quality. Semantic markup exposes the deficiencies in existing data and sufficient archival staff time must be set aside to handle inevitable audit, cleansing and editing required of catalogue and index data. Linked Data approaches can streamline workflows but are not a magic solution - knowledge of collections, context and provenance remain central to the work of the archivist. Advice: Time must be built into any programme to bring archivists up to speed with Linked Data and give them the opportunity to undertake mark-up.
Resources needed: the primary resources required are staff training and awareness of Linked Data and access to mark-up tools necessary to add Linked Data to catalogues in a seamless way. These tools should be freely accessible and intuitive to minimise the requirement for extensive (and expensive) training. Access to UKAT or to similar appropriate thesauri is advisable for RDF versions of subject, personal, corporate and placenames to be added to entries with minimum referral to external vocabularies (the 'research' phase of writing or editing catalogues). CALM and other software providers are currently developing embeds for these tools for their UK customers. Time taken for mark-up will differ according to the quality and length of the existing entry and the granularity of indexing but between 6-10 page-length collection level descriptions might reasonably be processed in an hour. Advice: key resources are staffing, training and IT. The potential of Linked Data provides a powerful test case for improved access to put to cataloguing funders and boost opportunity for acquiring extra cataloguing resources.
Prioritisation: Linked Data implementation works best when tailored to fit existing cataloguing backlogs and priorities - for example through ranking by intrinsic significance or the potential use of collections. Linked Data should not be an expensive, unrealistic add-on. Linked Data, however, provides the opportunity for enhancement and enrichment though linking out to related collections and sources. The availability or non-availability of these external sources will inevitably result in an adjustment to the markup prioritisation. Advice: follow existing plans closely and embed Linked Data markup where appropriate. Produce a 'showcase' collection(s) to highlight potential to internal and external audiences and funders.
Engagement: OMP involved cooperation from archivists within AIM25 in a formal workshop setting, and informerly via email lists and face to face meetings. Key enagagement partners will necessarily include: fellow archivists (what can be learnt from the experience of other information professionals?); institutional IT support (what resources will be necessary to add RDF and express changes in a public website?); senior management (how much will this cost? what are the benefits to the organisation?); users (what do they want, what do they expect? Will their teaching, learning and research experience be improved?). Advice: archivists should attend training programmes and join listservs that provide training or support on Linked Data.
Summary of advice:
Think carefully about the added value that Linked Data might bring. For example, speeding up indexing thus making closed collections more readily and speedily accessible. Write this up and quantify using test material from priority collections to provide a real-time example of its value
Staff and stakeholder training are a key element: identify training opportunities through JISC and other organisations, conferences and hack-days; training of new staff and cataloguers
Use available RDF indexing tools and embed in existing cataloguing practice. Listen out for new tools that are imminent, for example for CALM customers
Identify new audiences that can fully realise the potential of Linked Data. These might (indeed, ideally should) differ from existing audiences
Share best practice with fellow archivists
Collect feedback from users to inform priority list for semantic cataloguing (which data sources would be especially useful to them if connected?)
Showcase key collections and generate metrics to demonstrate enhanced take-up