The title of this post is a question from our evaluator David Kay that I somewhat failed to answer in my last post so here goes.
Scalability of OpenCalais use (some numbers)
We are using the Open Calais web service  (http://www.opencalais.com/documentation/calais-web-service-api) on a  free basis. OpenCalais allows for 50,000 transactions per day per user  on this basis. Anyone using the OMP workflow prototype will be regarded as the same user. When running the analysis in the workflow each ISAD(G)  element that is selected for analysis represents a single openCalais  transaction.
There are currently 23 text boxes that can be selected for analysis.  However during focus group meetings with the AIM25 archivists it was  suggested that only a 2 or 3 or these would be likely to have  potentially meaningful analysis.
In the last 9 or so years AIM25 has accrued records for 15,335 collections. So some back of envelope maths would tell us:
Rate of record addition: 15,335 / 9yrs that's roughly 4.5 records/day
Even in the unlikely event that someone analysed all 23 text-areas, that  would be just over 100 transactions per day. So there is some wriggle  room in the 50,000 limit for edits, re-analysis, etc.
Of course the reality of how archivists use AIM25 is probably not very well represented by those  numbers. The potential constraint placed on archivist  throughput due to openCalais analysis (ie max records that could be  analysed in a day) would be about 2,174. I'll let others who are more qualified to comment on  whether there has ever been, or may be in the future, a throughput that  exceeds that.
Robustness of processing against OpenCalais (some churn)
The analysis does take some time and as one would expect, the greater the number of elements  being analysed and the length of the text blocks sent for analysis will  increase the time taken. The prototype leaves the annotation and  processing of the result RDF to the javascript element of the AJAX  process. This means that there is reliance on the performance of the  client machine which is an unknown.
For the largest block of text in the current system (56,544 characters)  the browser-side processing did become mired in "Unresponsive script"  messages. The request to openCalais was not the thing causing the lag, so the culprit was the browser-side processing of the result. All this would suggest that more of this post-response processing should be pushed  over to the server.
A move to more server-side processing would also improve extensibility of the framework. Server-side brokerage of results from a range of services would allow for a more consistent response both for AIM25 workflow and for any potential third party clients.
JISC Step change will create Linked Data architecture for the UK archive sector, completing in July 2012. It draws on the lessons of Open Metadata Pathway and brings together King's College London Archives, ULCC, Axiell, Cumbria Archive Service, Historypin and the charity, 'We are what we do'. The project will use data held by AIM25 and focuses on delivering a new UKAT Web service and toolset that will allow archivists to mark up catalogues with triples and other semantic entities.
Subscribe to:
Post Comments (Atom)
 
No comments:
Post a Comment