Friday 8 June 2012

Development of CALM 9.3: towards a simple linked data indexing tool

One of the key components of our Step change project is the development of Axiell's software products CALM and CALMVIEW to support some linked data functionality. Both applications are widely used within the UK archival community. The current public release of CALM is version 9.2, and I have been supplied with a release of 9.3 to use with our catalogue data in a test environment here.

There has been for some years a structure of keen and active CALM user groups and a formal dialogue between these and Axiell in regard to ongoing refinement and development of these products. Very simply, within the time and resource confines of the Step change project, we are looking to develop the following tools that will:
  1. allow CALM users to interrogate linked data services, return results and insert selected URIs into CALM records
  2. allow these links to be displayed in the CALMVIEW web front end to the CALM application
  3. allow CALMVIEW to expose data from the CALM application in RDF so that it in turn can be interrogated by other services
In this entry I will be discussing development of point 1 (points 2 and 3 will emerge a little later in the project). Basically there are two aspects to the functionality in v.9.3 i) administration facility to allow configuration/testing of current and new linked data services and ii) user's facility to link URIs into CALM records. Let's look at the administration part first:
From the main administration menu, we go to a Linked Data submenu and can see the default linked data services which have been configured. These are AIM25, British Library British National Bibliography (BNB) and Wikipedia (Dbpedia). At this point we can add, remove or edit/test these services.

We can also see how the link to AIM25 is initially configured:

Next we can see the XSLT used to transform XML received from the selected service into CALM's XML.

At this point we can also select the test function and send some text to search the service and see what is returned. Here we have searched the AIM25 service with the text 'Churchill' and can see some XML returned.

This is transformed by the XSLT to CALM XML

and we can see the results processed below:

You can also use the Admin menu to determine which databases in CALM can be potentially linked to that service - you are not restricted to your catalogue database for example. So that's the admin part. What about linking a CALM catalogue record to a URL from one of these services? Well in catalogue menu, you select the Authorities menu in the left, and you now see a Linked Data button.  


Now you decide which service to link to:
You now decide which fields you would like the service to search against, and you can add your own free text to generate extra searches. Here I've selected the title field which contains the text 'Clementine Churchill', which is what I want to search against AIM25:


Now I see the results coming back from AIM25, and I've selected the Clementine Churchill person authority
And I then use the utility to post this URL into the CALM catalogue record
Here I've done the same with the Dbpedia service - you can see I get two Clementine Churchill topic returns. These actually turn out to be duplicates after checking the URLs so I end up having to delete one from the catalogue record...
Here's the catalogue record with links to both services in
Testing is all in the very early stages but some interesting (and I think important) points have emerged already:

1. Configuring or adding new Linked Data services to CALM: Axiell have designed this process to be generic and extensible. However to do this requires a knowledge and ability to write XSLT which will be beyond most archivists (me included!). So where does this leave us? A couple of options will likely emerge. Firstly, new services may be added in CALM upgrades through the User Group requests for enhancements process. Secondly, those CALM users with access to technical assistance may well be able to produce the requisite XSLT transforms necessary to add new services and share these through the User Group process.

 2. Making CALM find the internet: The Linked Data function in CALM means that the application now needs to reach the internet to do its searching. This is potentially problematic and will very much depend on the particular user technical environment and how user access to the internet is configured. Axiell, myself and the technical support for Cumbria County Council ICT have spent some considerable time testing and configuring to make this happen. Here in Cumbria for use a proxy server to access the internet and users have to enter username and password credentials when prompted. The proxy server then compares these credentials against a database, and if correct, allows access to the internet. Axiell have therefore had to develop CALM to prompt for these internet credentials if required, and we have found this works. In the near future we are likely to upgrade to a new proxy server which will check the user's Active Directory credentials and work straight off these, providing a cleaner, simpler route to the internet, and this should bypass the need to enter your internet credentials in CALM. However until CALM 9.3 is in a range of other user environments, it will not be possible to ensure that internet access is ensured.

3. Finding and determining meaningful results: Even after just a few minutes' use one can see some interesting issues emerge. The BL BNB service doesn't usually return anything from the text found in the fields of most of our CALM catalogue records, as it interprets text within each field as a complete search term. However using the freetext extra search box (without any CALM fields selected) to enter a crisper more precise word or string of text usually does work. The Dpbedia results seen in CALM from a search certainly need to be treated with caution. For example when parsing a catalogue record relating to some local police documents with fields which contained various text such as 'Whitehaven Police Station', 'Crime', 'Law', I had results back from Dpbedia as follows: 'Whitehaven', 'Crime', 'Law'. The Whitehaven URL related in fact to an entry about Whitehaven Railway Station, the Crime URL related not to an entry about criminalilty but to an entry for a Californian rock band called Crime, and the Law URL related not to legislation but to an entry about the actor Jude Law. Hmmmm....all good food for thought about future work in this area.

4. Useful services?: As things stand, three default services were configured because they were readily accessible and of some possible use to archivists and archive users. But the key services for archivists are as yet still in development or not readily accessible (I'm thinking of things like National Register of Archives, Manorial Documents Register etc). And there are major issues of course relating to person/corporate name authorities or geo name authorities which are essential areas for development, possibly by way of some sort of national brokering service to make these available in the same way that AIM25 is providing a sort of de facto UK Archival Thesaurus (UKAT) brokering service.

I'll be spending the next few weeks using CALM 9.3 to provide links to many hundreds of catalogue records, so we can see the results in a test version of CALMVIEW and which will allow proper user testing.

No comments:

Post a Comment