Thursday, 28 July 2011

Licensing

Linked data, amongst the many challenges it presents, requires licensing which is appropriate to its intended uses. As the compilation of databases is not regarded as a creative act under at least US law, the Creative Commons licence is probably not appropriate for licensing linked data. Instead, the Open Data Commons licences  (http://opendatacommons.org/licenses/) defined by the Open Knowledge  Foundation appears a more appropriate choice for this purpose.

Open Data Commons includes three licences: the Public Domain Dedication and License (PDDL), which places the data in the public domain and waives all rights, the Attribution License (ODC-By) which allows the sharing and adaptation of the data provided it remains attributed, and the Open Database License (ODbL), which allows the same rights provided any adaptations are distributed under the same licence.

Links to these licences are provides as RDF triples on the website: for  instance:-
·                 rdf:RDF
·                 xmlns:cc='http://creativecommons.org/ns#'
·                 xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
·                 xmlns:dcq='http://purl.org/dc/terms/'
·                 cc:License rdf:about="http://opendatacommons.org/licenses/odbl/1.0/">
·                 cc:legalcode
·                 rdf:resource="http://opendatacommons.org/licenses/odbl/1.0/"/>
·                 dcq:hasVersion>1.0</dcq:hasVersion>
·                 cc:License>
·                 rdf:RDF>

to define the Open Database License. They can therefore be readily incorporated into any linked data application.  Discussions with partner institutions within AIM25 will tease out the preferred licence. It is expected to be ODbL.

As a matter of record a statement is being added to all descriptions reflecting AIM25 as the origin of the data, that AIM25 is a partnership in which contributing members have rights, but that the data is otherwise freely accessible for reuse as it has been since inception some 11 years ago. (see for example the quantities of AIM25 created entries for the Women’s Library now also offered in the Hub.) The statement will reference the use of a Open Data Commons licence. 

An issue still to be addressed however is thesaurus support. UKAT is based on UNESCO which gave permission for development but we assume further permission will be required for further use and embedding.  The same applies to MESH, and Gay and Lesbian and other vocabularies developed by partner organisation.  We have also used Getty Arts and Architecture selectively. 

Gareth Knight and Patricia Methven

Wednesday, 27 July 2011

Second Archivists' Focus Group

A second archivists' focus group was convened on Wednesday 27th July. In attendance were representatives of Senate House Library, Wandsworth Heritage Services, London School of Economics, ULCC, King's College London, London Metropolitan Archives, the Institute of Education and the National Archives.

The group reviewed progress since the last meeting and viewed a demo of the new back end editing area of AIM25. The back end area allows individual, several or all ISAD(G) fields in collection level descriptions to be analysed against the existing AIM25 version of UKAT, a fuzzy match option against UKAT to identify synonyms and against OpenCalais using the OpenCalais service. These returns are listed alphabetically alongside the record in collapsable lists and these entities also highlighted in the text using a colour-coding formula to distinguish subject, place name, personal name and corporate names, and triples where those were identified. Analysis normally takes a matter of seconds though timing-out can occur with longer records.

A mouse roll-over feature has been inserted for each term in the text, allowing users to identify the particular attribute of the term in a tick box drop down menu, using a connect to one or more external services including Geonames (is 'London' the capital of the UK, a place in Canada, an author or part of a corporate name?). Triples can also be interrogated in roll-overs in order that the editing archivist might validate or clarify these entities. These choices can be saved and then exported. The enhanced content will be expressed in a new front-end delivery for the test records that demonstrate linking with external services, in order to enhance the user experience by pulling together reliable external information on a place, name, subject etc relevant to that collection.

The debate centred on how the editing process can be speeded up - for example by 'signing-off' the capital-of-the-UK version of London for all examples of 'London' across all ISAD(G) fields after review of the first instance ('treat all subsequent examples of 'London' in this record as the capital'). Linking with the NRA is desirable to identify authority terms set out in NCA format. Linking with Library of Congress was raised as an important deliverable in order to maximise the opportunity for synergy between archive and library descriptions, particularly in local authority record offices.

The question of updating UKAT in an ongoing fashion was raised - maintaining an RDF version of AIM25 UKAT must require minimum ongoing effort given constrained budgets and workloads. Analysis reveals the limitations of the existing thesaurus but also the possibility and desirability for external services like OpenCalais to be enhanced by input from ALM thesauri and vocabularies. This requires a conversation between JISC or key UK institutions and OpenCalais and similar services.

Next steps are to improve and tidy the editing area (for example by changing colour coding), plugging this into the front-end for test records and exploring NRA/ARCHON collaboration.

Monday, 18 July 2011

Archivists' focus group

King's College Archives recently hosted a focus group comprising leading London archivists familiar with using AIM25. The purpose of the focus group was to understand how Linked Data approaches might speed up the behind-the-scenes editing work of the archivist and improve the front-end user experience. Representatives of Senate House Library, the London School of Economics, Wandsworth Heritage Services, the London Metropolitan Archives, the British Postal Museum, the Institute of Education and University of London Computer Centre were in attendance. Development work on new AIM25 records was showcased.

Real-time use of OpenCalais was demonstrated and tested by members using sample data and the results compared. Subject-term creation was shown to be an area of potential concern - OpenCalais was developed by Reuters as a news and current affairs-support service and terms tend to reflect this focus. More input from archive vocabularies was called for to enable OpenCalais's corpus to be enriched with Higher Education and other terminology. It was also suggested that Linked Data could provide fuzzy matching between formal if rather arcane UNESCO-style subject terms and terms that are in more popular use, to encourage discovery and take-up. It was suggested that the UK Archival Thesaurus could be enhanced and made available in a SKOS version.

The practical use to hard-pressed archivists came up time and again as a topic of conversation. Most archivists have neither the time nor budgets to engage in experimentation but need practical tools that they can plug into their work without fuss. Quantifying the benefits of Linked Data is vital to sell the approach to funders and institutional management. Cross-domain services are an important attraction in surfacing and linking archive information with books and museum content. The benefits of linking to Wikipedia services (DPedia) were raised - Wikipedia lies at the centre of the Linked Data universe. Biographical content could be imported wholesale from other sources and adapted for use in a particular record, which would save time researching and writing one from scratch.

The plans of proprietary suppliers like Axiell and Adlib was raised as an issue - are they planning to incoporate Linked Data tools in future versions of their archive management software? The role of Google was discussed. Do they have any Linked Data plans and if not, why not?

The issue was raised of which fields in ISAD(G) to include in Linked Data work. It was argued that focusing only on Scope and Content was a mistake, not least because of the value of authority records (Admin/Biographical) and related records fields. Linking to the NRA to surface related collections was discussed.

Indexing was discussed by panel members. Editors of AIM25, the Archives Hub or similar tools should be able to draw on Linked Data to improve or enhance the personal, corporate and place names of new and existing records (and the ability to retrospectively run existing records through OpenCalais was flagged as an important requirement - archivists are more likely to embrace LD if they can painlessly re-index their current content). Linked Data provides the opportunity for more automation  and speedier indexing, which are particularly useful for smaller archives without cataloguing expertise.

Next steps included further development on the indexing tools in order to compare workflow with traditional methods; build a prototype front-end delivery system to enhance collection level descriptions and engage in conversation with Google and others to identify best practice.