Tuesday, 4 December 2012

Linked Data in Archives, Libraries and Museums Meeting, 3rd December 2012

I organised a roundtable meeting of around 35 archivists, curators and Linked Data specialists drawn from the UK cultural sector, who met at King's College on the afternoon of 3 December. The audience included representatives of major institutions such as the British Library, British Museum and Imperial War Museum, from AIM25 partner organisations and from other key players including the Collections Trust, Mimas, JISC, Wikipedia, Historypin and Culture24. Software vendors were represented with Axiell CALM, Adlib and MODES.

The focus of the meeting was working out possible practical 'next steps' on Linked Data in archives, libraries and museums, following the completion of a number of successful projects over the past 18 months, including a clutch of JISC Discovery programme initiatives including Step change and up and coming events including the JISC discovery meeting planned for February and the LODLAM conference in Montreal in summer, 2013. 

The meeting opened with a number of presentations. Gordon McKenna of Collections Trust reviewed Europeana initiatives, including the Linked Heritage project, a recent partner survey that revealed ongoing IP worries in the sector over access to material; and raising the point that partner-publishers arguably need more content to connect to (successful Linked Data is not just about what you publish but what you consume). Understanding user requirements better was also a key concern. 

Andrew Gray, Wikipedian in Residence at the British Library, described the exciting work currently being carried out on authority files and introduced 'Wikidata'  - the new DBpedia. He stressed the value of controlled vocabularies within the ALM sector and the need to demystify the language used in Linked Data projects as this was potentially putting off users.

Adrian Stevenson of Mimas reviewed the groundbreaking work of LOCAH, upon which Step change and other projects have built, and raised a number of important points including the need for more, easier-to-use, tools and the complexities of dealing with duplication, inconsistency and currency in the data. He called for more co-operation among cultural partners (not just ALM practitioners). Adrian rounded up by previewing the new World War One aggregation site, which while not using Linked Data per se, is a good example of a cross-cultural aggregation project where different archives will sometimes demonstrate variable levels of technical knowledge and expertise (for example concerning APIs) and consequently often need active support to make their data available.

Geoff Browell reviewed the Step change and Trenches to Triples projects and their rationale - to encourage the creation of archival Linked Data by making it part of the  normal cataloguing/indexing process, and to do this through the incorporation of editing and publishing tools installed in CALM, Adlib and other archive software commonly used by the archival community. The experience of Cumbria on the Step change project shows that users need to come first and that there is a real demand for the release of key datasets such as Victoria County History as Linked Data.

Bruce Tate of the Institute of Historical Research concuded the first session by previewing the enlarged 'Connected histories' project, whose API will soon be available to consumers, including a new georeferencing tool to map content held in British History Online. He reviewed a recent impact measurement survey, which chimed with several speakers in the meeting, who argued that the community needs more, and better quality, information on how Linked Data might help different audiences including academics and the general public, in order to sell the concept (and secure necessary investment) to internal audiences within institutions (senior management), and to funders like the Research Councils.  

The second half of the meeting comprised three discussions led by leading practitioners.

Nick Stanhope of Historypin led on community engagement and the opportunity afforded by new crowdsourcing tools being developed by Historypin to help crowdsource Linked Data - for example the verification of people, places and their relationships. He stressed the role of storytelling that Linked Data ought to seek to capture. Robert Baxter of Cumbria Archive Service and Step change, argued that most archives need full Google visability for their records as a starting point (which many do not currently have) and reiterated the need to sell Linked Data more effectively within institutions. A more intuitive 'stepping stones' approach is needed to support research discovery (something also raised by Nick Stanhope). Linked Data and other tools ought to support this view of research as exploration or journey. Richard Light reviewed the important development work carried out on the MODES software and the reviews undertaken by CIDOC-CRM. He focused on next steps, raising the questions of whose job it is to publish data and the value of an 'open ended distributed database of cultural history'. Among his recommendations were that:

· Publishers of authorities ought to publish as Linked Data as a matter of course
· Software vendors in the sector should be encouraged to provide some form of "web termlist" facility so that recorders can easily add Linked Data identifiers
· There needs to be agreement on the need for sector-specific guidelines for structuring Linked Data resources (the "mortar" in the "wall"), and ideally a working group actually producing some
· There should be an exploration of how we get "horizontal" resources for the common entity types (people, places, etc.) so we have some concepts/URLs we can actually share
Several key themes emerged from the afternoon:

· Advocacy: The role of case studies and impact assessments to support business cases to support internal and sectoral/funder investment
· Audiences: A renewed focus on the user and consumer of data, their stories and research journey
· Accessibility: to simplify data creation by involving vendors and minimising the variety of editing tools and by the use of agreed master authorities to cut down URI duplication. To create a registry of tools, and develop suitable plug-ins and mediation services but to do so based on sector agreement, not project-by-project. The Mellon-funded Research Base is one such initiative to minmise duplication.
Other themes included:

· Licensing - this still remains a stumbling block due to lack of clarity around Creative Commons licenses - CC0 or CC-BY?
· Training - pratitioners need technical support and training to get the best from Linked Data
· Cultural sector - this should be viewed in the round and not just archives, libraries and museums but the broader sector including galleries and other arts organisations, aggregators and funders. The Arts Council and national film archive community were two such organisations or communities of interest that were cited.