Friday, 5 October 2012

Lessons learned part 2

5. Usability

The recent JISC programme round-up in Birmingham highlighted a number of potential common problems or issues thrown up by the projects including the practicality of APIs and licensing, but I would add user experience to the list.

Linked Data discussions in the field of libraries, archives and museums have generally, and until recently, been confined to discussion of the complex technical challenges involved in making systems work. This is understandable, but recent discussions such as those held at the 2nd Linked Data in Libraries conference in Edinburgh on 21st September, have focused attention on making Linked Data as comprehensible and usable as possible by the general public.Papers on the consumption of data included introductions to the important work on visualisation currently under way. Linked Data provides the opportunity to move beyond the conventional cataloguing paradigm and its corollary, published lists and tables of data that risk being seen as visually unappealing and stodgy (sometimes unfairly). The complex relationships described in Linked Data don't translate easily to tables but rather lend themselves to the dynamic graphs and representations now becoming common, for example in the display of statistics by national governments.Linked Data provides an opportunity to begin to see data in new ways for library, archive and museum users.

Fundamentally, for Linked Data to be more widely adopted, there need to be a focus on the user experience and demonstrating the value added by combining sources, mapping sources using Linked Data and other practical improvements.

Step change set out to address the usability concern from the outset, but the project has highlighted how much work needs to be done in this area. CALM improvements include the display of relevant external links alongside catalogue records - for example British National Bibliography entries. User testing established that archivists need to exercise discretion in the links they set up and make visible (whatever is happening in the back-end).Links must work (accurate and complete data is returned speedily), but must also be relevant (for example appropriate to the level of record being displayed). Branding starts becoming important to distinguish the origin of data and mitigate a tendency for users to view the data in archives, libraries and museums websites as coming from that one source (that repository). Users will need to start viewing such websites more as they have learned to interrogate a page of Google search returns (as coming from multiple sources). A simple 'Linked Data' logo should be adopted to provide users with a shorthand way of recognising that an additional level of useful information is now available and can be trusted (because an information professional has actively check the source and chosen to link it).

Next steps:

Further user testing is now under way following the release of the Linked Data CALM and its front-end. This will take place in Cumbria involving users of CALM and members of the public familiar with the current archive website. This will drive improvements ready for the release of CALM version 10. Work is under way on creating RDFa and the rendering of selected terms to display useful external content in an attractive way, while not confusing the user with excess information.


Friday, 7 September 2012

Lessons learned

The Step change project has identified a number of useful 'lessons learned' - more will follow in future posts.

1. Data quality

The creation of RDF and linking with similar resources might expose legacy catalogue data as uneven, inadequate or inaccurate. It is likely that many existing catalogues, though adequate for basic online searching, are not up to the task in a Linked Data environment. Date ranges cited in archive catalogues are too broad to identify components of collections; geographical designations insufficiently specific or too fuzzy (does 'London' mean Charing Cross or Croydon, the City or London, Canada? Which units are being described, and are they historically accurate?). The reality is that many catalogues not only predate RDF but the internet, and arguably are not fit for purpose in a Google-enabled search environment, either being inaccessible to search engines or not optimised for web-crawling.

Next steps:

Review of links: while an archivist or librarian might be familiar with their own collections, they are likely to be unfamilar with each other's content, or content from unrelated sources (such as maps, audio-visual material or database content). A real example encountered in Step change was the join-up between archive collection descriptions and bibliographic information using the BNB, where archivists accessing the live service in CALM were often unable to identify, and therefore select for linking, the correct edition of an author's publication to match the relevant archive description by, or about, that author - the service returned ambiguous or difficult-to-interpret bibliographic data. Confronted with practical problems such as these, the professional focus group, which convened to review the markup tool enbedded in CALM, recommended the implementation of an editing stage into CALM to preview possible selections of Linked Data join-ups, in order to minimise potential mistakes and make mark-up more efficient by reducing the necessity for time-consuming corrections post facto.

Knowledge transfer: Furthermore, the linking preview problem clearly exposes the cross-disciplinary knowledge gap that hinders joint-up between collections, except at the level of broad catagories, mapped across domains. Librarians, archivists, museum curators, academic experts and GIS and data curators simply don't know enough about each other's data to make truly informed decisions that will underpin the entity relationship-identification and entity relationship-building that is at the heart of the successful implementation of Linked Data methodologies.

Outcome and next steps: Axiell is considering incorporating an improved editing tool in future releases of CALM. For the mapping component of the project for AIM25, a preview tool has been developed and installed in the Alicat cataloguing utility that uses the Google maps API and Geonames to preview the names of places in micromaps, to allow the archivist to make speedier, more accurate choices of placenames before hitting the 'save' button.

Step change's publication of UKAT as a Linked Data service helps overcome the knowledge gap as it at least provides an agreed subject, place, person and corporate name listing as a common starting point in describing certain entities. What it doesn't do is capture relationships and more work needs to be done to describe subject and domain-specific triples. A publicly-supported triplestore would be an important infrastructure development that would give professionals confidence that Linked Data is here to stay, and to encourage investment to embed in conventional cataloguing. Further steps are necessary, though, not least sponsorship of co-working between different knowledge professionals using cross-domain data - to properly document the challenges of mixing and matching library, archive and museum metadata and linking it with, say, research outputs in the arts and humanities.

The problem of inadequate catalogues is difficult to resolve - cataloguing backlogs are a higher priority than retroconversion and should a catalogue be useful to potential researchers, it is usually deemed adequate. Training should be provided to potential cataloguers to understand better the implications of online search strategies and search engine optimisation (aside from Linked Data), which are probably poorly understood by most archivists. The use of certain agreed vocabularies should be encouraged where these exist as Linked Data and the AIM25-UKAT service helps supply this need for an indexing tool that coincidentally creates RDF without archivists necessarily being aware that this is happening. Some agreement should be reached on other specialist vocabularies, name authorities and place data (including historical places - at least in the UK) to create established hubs. These will potentially be more robust and avoid a fragile cats cradle of APIs prone to network disruption, and serve as trustworthy and authentic points of reference.

2. The value of public-private partnership

Step change was built on a good working relationship with a charity (We are what we do - responsible for Historypin), and a commercial vendor (Axiell). The rationale behind their involvement was that for Linked Data use to become widespread in libraries, archives and museums, it should be made available through the trusted suppliers upon which professionals have come to depend. Good will on both sides and in both cases enabled the team to overcome serious problems with enforced development staff absences. These challenges do point to a potential over-dependency on a relatively small number of experts able to combine knowledge of RDF technologies with knowledge of library, archive and museum data and practices.

The Axiell experience demonstrated, through the focus group and demo at the national CALM user group, and perhaps unexpectedly, that there is substantial interest from the archive community for Linked Data tools and understanding of their utility.

Next steps: Axiell is releasing the embedded Alicat markup tool in CALM version 9.3 and has agreed to further iterations and improvements in future releases. Crucially, these will be timetabled in response to user feedback.  Similar partnerships ought to be explored with other software suppliers, such as Adlib and a meeting is planned with the UK Adlib user community and representatives from Adlib with this in mind.

3. Technical limitations of APIs

Considerable staff time needed to be set aside for dealing with poor quality responses to queries and trying to finetune services. Service reliability is essential if Linked Data approaches are to work. A significant obstacle were local firewalls and authentication protocols and persuading local IT to address these concerns. Change requests for an experimental Linked Data project involving archive catalogues were understandably deemed to be low priority. They also carried a cost implication that needs to be factored into budgets.

Next steps: the cost implications of technical implementation need to be quantified and documentation published to provide institutional IT with context to make informed technical decisions - and persuade managers to authorise expenditure.

4. Value of co-operation

Step change sought to build a number of professional relationships to help leverage goodwill and kickstart a more strategic appreciation of the types of datasets that ought to be output as RDF. So far, datasets have mainly been confied to the library and museum sectors and have been created in an ad hoc way by interested experts, rather than with end users in mind. Discussions were held with The National Archives with a view to using the National Register of Archives dataset as a prototype name authority service. This, and other heavily used TNA services such as the Manorial Documents Register, would prove particularly valuable to the types of local authority archives participating in Step change, with their focus on local history. Test data relating to women in the NRA was released via TNA Labs through Talis' Kasabi service. The withdrawal of support for the service at very short notice provides a salutory lesson that the availability of commercial services cannot be taken for granted. The National Archives  is currently renewing its backend systems and will review the status of the NRA, MDR, Archon and other databases in due course.

Discussions were held with other interested parties, not least in the area those representing geographical data. Testing is due to commence with historical placenames supplied as part of the JISC DEEP project concerning the English Placenames Survey, relating to Cumbria, with a view to correcting locating and mapping catalogues.

As part of the CALM development work, a set of configuration instructions were published by Axiell to enable archivists to execute XSLT tranforms and link to other services as they become available. The British Museum collections were identified as a good contender with which to test out these instructions, on account of the high quality data that they provide and the mutual political benefit of local institutions to be able to demonstrate a link back to a major national collection held in London, and to the BM to be able to demonstrate that museum objects of local significance are being accessed be local people in an intelligent 'Linked Datery' way (for example mapping archaeological finds in the collection and linking with local catalogues or historical society publications). Work on testing this approach is still ongoing and conclusions will be presented in a future post.

Next steps: more cross-sectoral cooperation and scoping is required to think strategically about the kinds of datasets that different audiences need as Linked Data - archivists and different types of users - schools, the general public, genealogists, academics, researchers. Large national datasets that culd benefit from unlocking inclde the Clergy of the Church of England Database, British History Online and the Victoria County History. Testing is due to begin with DEEP data and ongoing with BM data.



Friday, 3 August 2012

Getting Closer to a Map Interface

We're continuing to get closer to a useful geographical interface for visualizing collections.  Using the AIM25-CALM service that Rory's created, we're able to call relevant collections based on time and latitude/longitude filters.  From the programming side, Rory created a test interface to experiment with what kind of data we would get with various calls to the service (fig 1).

[Fig 1]. Early tests of map showing collections returned from AIM25-CALM service.
One of the things that we've been experimenting with is the granularity, which defines how many levels of detail exist for a placename.  For instance: country/province/city/ or country/state/county/city/. These administrative districts vary by country, but helps us lower the signal to noise ratio. As you can see in Figure 2, the level 4 granularity returns more specific information in France than it does in the United Kingdom.

[Fig 2]. Experimenting with maximum granularity yields more results in France than the UK.  
As we move forward with this, we'll be further testing the returns of the service, and refining the user interface to integrate into the Historypin environment.

Wednesday, 4 July 2012

CALM User Testing

We organised a focus group session on Thursday 28th June at the Wellcome Trust to review progress so far on adapting the CALM backend to query external services and generate and store RDF. The meeting comprised a mix of CALM archivists and some professionals familiar with cataloguing processes. The main purpose of the meeting was to use a CALM 9.3 development environment to test the robustness of workflows for analysing catalogue and authority data; comment on the quality and sources of external data; and review improvements to the front end - CALMVIEW - that will publish appropriate service links. The CALMVIEW linking between test 9.3 installation and internet could not be configured on the day, as CALMVIEW development is still under way, but a screenshot of the AIM25 link was shown to the participants.

The underlying rationale is that Linked Data processing, sharing and exporting from CALM should become as normal and integral part of cataloguing as is possible - one that does not require an immense investment in additional work or process on the part of archivists, or a detailed, and unrealistically obtainable, technical knowledge of RDF.

The main workflow for analysis of sample Wellcome Library and Cumbria catalogues was tested out using the UKAT service, DBpedia and the BL British National Bibliography, to cover archival, biographical and bibliographical-type material. Key improvements/findings requested were:

  • Improved bulk analysis of records (to speed up processing)
  • Preview of the resolved multiple service returns before embedding (to overcome the problem of poor quality external data being selected or similar-sounding names of people and places being mistakenly chosen, or, for example, to preview and select the correct edition of a multi-edition printed publication)
  • The ability for archivists in the back end to refine and select only certain records for publication (necessary because some services only return dirty data or data strings, which are of little value to researchers)
  • Demarcation of front end presentation of Linked Data links from host catalogue data to minimise confusion as to the origin of the data source  
  • The need for the archive profession to agree to the creation of a priority list of Linked Data services that would be useful for professionals and users, such as the NRA and specialist vocabularies.
The focus group was followed by a meeting of the CALM User Group, at which CALM representatives outlined the release schedule.

Tuesday, 3 July 2012

User Interface and First Steps Toward Meshups

When we originally conceptualized how we might begin to automatically incorporate collections data into Historypin, we imagined having the ability to peer into the past and be able to reach into various collections nearby to pull out relevant information.  Ultimately, this is still what we're working toward, but a number of complications prevent this from being feasible at the moment.  But it's worth sharing some of our mockups for how we saw this working.

[Fig 1] This mockup shows the existing experience of historical photos overlaid in Street View on the Historypin site.  We've added a "Dig Deeper" panel on the right which initiates a call to the Step change service based on the date and location.
[Fig 2] This mockup shows the results of the query to the Step change service, including  relevant AIM25 collections that may relate to the date and location referenced.
[Fig 3]  Selecting one of the collections would return information more information about the collection, including the location of the collection and a link to the collection webpage.
There are a number of reasons why this execution is not quite practical yet, though it may be feasible in a future project.  The primary complication here is the signal to noise ratio when in London, as so many of the collections within AIM25 are relevant to London, but the geographic specificity in the collection metadata is often not very detailed on the level of granularity that you get when in Street View.  If we ask for relevant collections within a 1 mile radius of a specific latitude and longitude for instance, we may get back 2-300 collections with little clue as to why this collection is relevant to this location. 

Another problem is what we've been calling the Needle In A Haystack problem, once you get away from London and into other parts of the world.  While the AIM25 collections are largely in Greater London (sorry if I'm getting my terminology wrong--I'm an American!), there are many collections that are relevant to other parts of the world.  Rory has done an amazing job parsing out locations from the collections metadata and using Geonames to resolve these locations.  So we can now see that a particular collection may have relevance to locations in China for instance, which is one of the locations  we've been testing with.  Here, our problem is that we've got just one or two collections and they are geotagged for a small town where someone lived.  So unless we set a really large bounding box, unless you happen to be in Street View in that town, you'd never learn about that collection, even though it has documents pertinent to many locations in China and Tibet.

Friday, 8 June 2012

Development of CALM 9.3: towards a simple linked data indexing tool

One of the key components of our Step change project is the development of Axiell's software products CALM and CALMVIEW to support some linked data functionality. Both applications are widely used within the UK archival community. The current public release of CALM is version 9.2, and I have been supplied with a release of 9.3 to use with our catalogue data in a test environment here.

There has been for some years a structure of keen and active CALM user groups and a formal dialogue between these and Axiell in regard to ongoing refinement and development of these products. Very simply, within the time and resource confines of the Step change project, we are looking to develop the following tools that will:
  1. allow CALM users to interrogate linked data services, return results and insert selected URIs into CALM records
  2. allow these links to be displayed in the CALMVIEW web front end to the CALM application
  3. allow CALMVIEW to expose data from the CALM application in RDF so that it in turn can be interrogated by other services
In this entry I will be discussing development of point 1 (points 2 and 3 will emerge a little later in the project). Basically there are two aspects to the functionality in v.9.3 i) administration facility to allow configuration/testing of current and new linked data services and ii) user's facility to link URIs into CALM records. Let's look at the administration part first:
From the main administration menu, we go to a Linked Data submenu and can see the default linked data services which have been configured. These are AIM25, British Library British National Bibliography (BNB) and Wikipedia (Dbpedia). At this point we can add, remove or edit/test these services.

We can also see how the link to AIM25 is initially configured:

Next we can see the XSLT used to transform XML received from the selected service into CALM's XML.

At this point we can also select the test function and send some text to search the service and see what is returned. Here we have searched the AIM25 service with the text 'Churchill' and can see some XML returned.

This is transformed by the XSLT to CALM XML

and we can see the results processed below:

You can also use the Admin menu to determine which databases in CALM can be potentially linked to that service - you are not restricted to your catalogue database for example. So that's the admin part. What about linking a CALM catalogue record to a URL from one of these services? Well in catalogue menu, you select the Authorities menu in the left, and you now see a Linked Data button.  


Now you decide which service to link to:
You now decide which fields you would like the service to search against, and you can add your own free text to generate extra searches. Here I've selected the title field which contains the text 'Clementine Churchill', which is what I want to search against AIM25:


Now I see the results coming back from AIM25, and I've selected the Clementine Churchill person authority
And I then use the utility to post this URL into the CALM catalogue record
Here I've done the same with the Dbpedia service - you can see I get two Clementine Churchill topic returns. These actually turn out to be duplicates after checking the URLs so I end up having to delete one from the catalogue record...
Here's the catalogue record with links to both services in
Testing is all in the very early stages but some interesting (and I think important) points have emerged already:

1. Configuring or adding new Linked Data services to CALM: Axiell have designed this process to be generic and extensible. However to do this requires a knowledge and ability to write XSLT which will be beyond most archivists (me included!). So where does this leave us? A couple of options will likely emerge. Firstly, new services may be added in CALM upgrades through the User Group requests for enhancements process. Secondly, those CALM users with access to technical assistance may well be able to produce the requisite XSLT transforms necessary to add new services and share these through the User Group process.

 2. Making CALM find the internet: The Linked Data function in CALM means that the application now needs to reach the internet to do its searching. This is potentially problematic and will very much depend on the particular user technical environment and how user access to the internet is configured. Axiell, myself and the technical support for Cumbria County Council ICT have spent some considerable time testing and configuring to make this happen. Here in Cumbria for use a proxy server to access the internet and users have to enter username and password credentials when prompted. The proxy server then compares these credentials against a database, and if correct, allows access to the internet. Axiell have therefore had to develop CALM to prompt for these internet credentials if required, and we have found this works. In the near future we are likely to upgrade to a new proxy server which will check the user's Active Directory credentials and work straight off these, providing a cleaner, simpler route to the internet, and this should bypass the need to enter your internet credentials in CALM. However until CALM 9.3 is in a range of other user environments, it will not be possible to ensure that internet access is ensured.

3. Finding and determining meaningful results: Even after just a few minutes' use one can see some interesting issues emerge. The BL BNB service doesn't usually return anything from the text found in the fields of most of our CALM catalogue records, as it interprets text within each field as a complete search term. However using the freetext extra search box (without any CALM fields selected) to enter a crisper more precise word or string of text usually does work. The Dpbedia results seen in CALM from a search certainly need to be treated with caution. For example when parsing a catalogue record relating to some local police documents with fields which contained various text such as 'Whitehaven Police Station', 'Crime', 'Law', I had results back from Dpbedia as follows: 'Whitehaven', 'Crime', 'Law'. The Whitehaven URL related in fact to an entry about Whitehaven Railway Station, the Crime URL related not to an entry about criminalilty but to an entry for a Californian rock band called Crime, and the Law URL related not to legislation but to an entry about the actor Jude Law. Hmmmm....all good food for thought about future work in this area.

4. Useful services?: As things stand, three default services were configured because they were readily accessible and of some possible use to archivists and archive users. But the key services for archivists are as yet still in development or not readily accessible (I'm thinking of things like National Register of Archives, Manorial Documents Register etc). And there are major issues of course relating to person/corporate name authorities or geo name authorities which are essential areas for development, possibly by way of some sort of national brokering service to make these available in the same way that AIM25 is providing a sort of de facto UK Archival Thesaurus (UKAT) brokering service.

I'll be spending the next few weeks using CALM 9.3 to provide links to many hundreds of catalogue records, so we can see the results in a test version of CALMVIEW and which will allow proper user testing.

Thursday, 7 June 2012

Historypin and Step change: Collections in Context

The not-for-profit behavior change agency We Are What We Do is known for creating simple yet compelling tools that encourage people to change their behavior in small ways that amount to big impacts in areas like waste reduction, childhood obesity, social isolation, etc.


Historypin is a project we created together with Google and several cultural memory institutions to help bridge the gap between cultures and generations and help rebuild and strengthen ties within communities. We created simple tools that could be used by individuals, schools, communities, and institutions to create a shared view of the layers of history that make up a community.

It's been nearly a year since the official launch of Historypin and we've experienced tremendous growth, with hundreds of cultural heritage institutions adding content and creating Channels, and tens of thousands of individual users joining the site and adding their own content.  It's in this growth that we've started to realize the potential for academic research of the blending of multiple collections, and the opportunity to incorporate other sources of content in more automated ways.  We've just begun to scratch the surface of what will be possible in years to come.

Our role in the Step change project is to explore options for incorporating AIM25 collection holdings into the geographic landscape of scores of historical photos and audio/video recordings; and to assist researchers, scholars, and the general public in the discovery and contextualization of these holdings.

In the coming weeks, we'll explore some of the user interface prototypes we're testing and document the challenges and successes we meet along the way.