Building e-Humanities infrastructure
Reflections on e-Humanities workshop, Melbourne e-Research Scholarship Centre, 2009-08-12
Building generic ICT infrastructure to support humanities research seems to be a difficult task. The standard approach is to
- collect a bunch of usage stories from different communities
- infer common business processes based on those stories
- build infrastructure that supports those business processes
The theory is that a community would then take the generic infrastructure and customise it to meet their particular needs. The problem is that there is something about the humanities that makes generic business processes hard to find.
We’ve blogged previously about the Project Bamboo approach to finding generic e-Humanities business processes. Project Bamboo certainly had difficulty converting its scholarly narratives into common recipes. Maybe there aren’t any processes common to the different strands of humanities research? Unlikely. Rather, the fierce independence of humanities researchers makes it difficult to infer commonalities. Suggesting to a humanities researcher that she might have a research process in common with her peers carries with it an inference that her research is not unique. Even uttering the phrase “business process” can put humanities researchers offside (some of them conflate business and commerce).
In this context, there was a little nervousness leading up to the Interconnections and Services in the eHumanities: Reflecting on Current Initiatives workshop hosted by the University of Melbourne eScholarship Research Centre on 12 August.
The aims of the workshop were twofold: to share recent activity, and to sketch a proposal for possible future development of e-Humanities infrastructure in Australia.
The aim of sharing information was certainly met. Invitees from University of Melbourne eScholarship Research Centre, Link Affiliates, the National Library of Australia (NLA), the National Archives of Australia (NAA) and the Australian National Data Service (ANDS) discussed:
- The digitisation cradle used to present the NLA’s newspaper service. This service presents digitised images of pages from Australian newspapers between 1803 and 1954. Users can select articles within a page and view them alongside an OCR translation and descriptive tags. To quote from the website
“Automatically extracting text from scans of old newspapers is extremely challenging. Although this project is using the best available Optical Character Recognition (OCR) software, the condition of the images it has to process combined with the frequently small fonts used means that many errors of interpretation are made.”
To tackle this problem, the digitisation cradle allows users to edit both the OCR translation and the tags. This has allowed the NLA to successfully crowd-source fixes to their OCR text translations and metadata about each article. The top text-correctors even have a hall of fame, some of them correcting more than 20,000 lines a month!
- People Australia services that exchange data with other biographical identity services. People Australia identifies and publishes metadata about people and organisations described in Australian library holdings. For example, the record for the Australian Ballet provides a biography of the ballet, links to related people and organisations, and to related resources from other NLA services. Maintaining a database of all “significant Australians” is a mammoth task. For that reason, People Australia augments its records with information collected from authoritative external identity services . Information from external suppliers is encoded using Encoded Archive Context (EAC) format and periodically harvested using OAI-PMH. In return, People Australia publishes unique, persistent, resolvable identifiers for people and organisations for use by external identity services. For example, the People Australia record on Nellie Melba http://nla.gov.au/nla.party-505278 contains bibliographical information harvested from both Music Australia and the Australian Women’s Register. These external services in turn use Nellie’s People Australia identifier to reference resources in the National Library’s services.
- Link Affiliates analysis of common services supporting collaborative biographical encyclopaedia like the Australian Women’s Archive. The analysis shows how eHumanities infrastructure supporting one domain can be built by composing smaller, more generic groups of services (like annotation and syndication services). This work is yet to be published, but see http://blog.linkaffiliates.net.au/2009/08/14/modularising-the-e-framework/ for an overview of the approach used, and http://linkaffiliates.net.au/Activities/bamboo.html#Stage2 for a preview of the service analysis.
The second aim of the workshop was to sketch a proposal for possible future development of eHumanities infrastructure in Australia. The ANDS folks couldn’t go into too much detail about funding until their business plan is released (any time now), but described the sort of infrastructure projects they would like to see funded. Essentially ANDS will not fund digitisation efforts, but is interested in services that bring digitised data “into the commons” . With this in mind, the workshop invitees sketched out a proposal involving
- generalising the NLA digitisation cradle so that it could be used for content other than newspapers (the NLA code is all open source), and
- using the digitisation cradle to present and crowd-source refinement of two collections: the Australian Popular Fiction Archive, and the National Archive of Australia’s World War I online service records.
Other more “fine grained” ideas emerging from the workshop included:
- contributing People Australia experience to an upcoming review of EAC
- harmonising EAC and ISO 2146 view of parties used by the ANDS collection registry
- exposing People Australia identity data as RDF triples (example here)


