AIM25 Step change and Open Metadata Pathway: July 2012

Wednesday, 4 July 2012

CALM User Testing

We organised a focus group session on Thursday 28th June at the Wellcome Trust to review progress so far on adapting the CALM backend to query external services and generate and store RDF. The meeting comprised a mix of CALM archivists and some professionals familiar with cataloguing processes. The main purpose of the meeting was to use a CALM 9.3 development environment to test the robustness of workflows for analysing catalogue and authority data; comment on the quality and sources of external data; and review improvements to the front end - CALMVIEW - that will publish appropriate service links. The CALMVIEW linking between test 9.3 installation and internet could not be configured on the day, as CALMVIEW development is still under way, but a screenshot of the AIM25 link was shown to the participants.

The underlying rationale is that Linked Data processing, sharing and exporting from CALM should become as normal and integral part of cataloguing as is possible - one that does not require an immense investment in additional work or process on the part of archivists, or a detailed, and unrealistically obtainable, technical knowledge of RDF.

The main workflow for analysis of sample Wellcome Library and Cumbria catalogues was tested out using the UKAT service, DBpedia and the BL British National Bibliography, to cover archival, biographical and bibliographical-type material. Key improvements/findings requested were:

Improved bulk analysis of records (to speed up processing)
Preview of the resolved multiple service returns before embedding (to overcome the problem of poor quality external data being selected or similar-sounding names of people and places being mistakenly chosen, or, for example, to preview and select the correct edition of a multi-edition printed publication)
The ability for archivists in the back end to refine and select only certain records for publication (necessary because some services only return dirty data or data strings, which are of little value to researchers)
Demarcation of front end presentation of Linked Data links from host catalogue data to minimise confusion as to the origin of the data source
The need for the archive profession to agree to the creation of a priority list of Linked Data services that would be useful for professionals and users, such as the NRA and specialist vocabularies.

The focus group was followed by a meeting of the CALM User Group, at which CALM representatives outlined the release schedule.

Tuesday, 3 July 2012

User Interface and First Steps Toward Meshups

When we originally conceptualized how we might begin to automatically incorporate collections data into Historypin, we imagined having the ability to peer into the past and be able to reach into various collections nearby to pull out relevant information. Ultimately, this is still what we're working toward, but a number of complications prevent this from being feasible at the moment. But it's worth sharing some of our mockups for how we saw this working.

[Fig 1] This mockup shows the existing experience of historical photos overlaid in Street View on the Historypin site. We've added a "Dig Deeper" panel on the right which initiates a call to the Step change service based on the date and location.

[Fig 2] This mockup shows the results of the query to the Step change service, including relevant AIM25 collections that may relate to the date and location referenced.

[Fig 3] Selecting one of the collections would return information more information about the collection, including the location of the collection and a link to the collection webpage.

There are a number of reasons why this execution is not quite practical yet, though it may be feasible in a future project. The primary complication here is the signal to noise ratio when in London, as so many of the collections within AIM25 are relevant to London, but the geographic specificity in the collection metadata is often not very detailed on the level of granularity that you get when in Street View. If we ask for relevant collections within a 1 mile radius of a specific latitude and longitude for instance, we may get back 2-300 collections with little clue as to why this collection is relevant to this location.

Another problem is what we've been calling the Needle In A Haystack problem, once you get away from London and into other parts of the world. While the AIM25 collections are largely in Greater London (sorry if I'm getting my terminology wrong--I'm an American!), there are many collections that are relevant to other parts of the world. Rory has done an amazing job parsing out locations from the collections metadata and using Geonames to resolve these locations. So we can now see that a particular collection may have relevance to locations in China for instance, which is one of the locations we've been testing with. Here, our problem is that we've got just one or two collections and they are geotagged for a small town where someone lived. So unless we set a really large bounding box, unless you happen to be in Street View in that town, you'd never learn about that collection, even though it has documents pertinent to many locations in China and Tibet.