Posts Tagged ‘digitisation’
The Visible Archive
The e-Research Australasia conference, which recently concluded in Sydney, had data visualisation as one of its major focuses. George Djorgovski’s plenary talk [abstract, presentation] on virtualisation in science posed the urgency for visualisation in stark terms: as the models that science is called on to build get more and more complex, not only will most data being gathered (in the “data avalanche”) never be seen by human eye; even the models needed to make sense of the data are starting to surpass human understanding, and can only be turned over to machines to deal with.
Someone will greet the notion that we will turn over our comprehension of the world to machines as a challenge. Your Humble Correspondent is more inclined to think of SkyNet…
The highlight of the conference for me, at least, was Mitchell Whitelaw’s Exploring Archival Collections with Interactive Visualisation, presenting two prototype visualisations of the contents of the National Archives of Australia. The first visualisation gives an overview of all 57,000 series in the National Archives, according to their sizes (in number of items and shelf-space), and starting date. That visualisation allows critical dates to emerge from the data—the disproportionate importance of 1901, 1914 and 1939 in Australian archive-gathering, for example; and the visualisation also highlights relations between different series graphically. But although getting 57,000 complex data points into a single 2-D diagram has its appeal, there were no real surprises there.
The second visualisation, I found much more interesting. It uses tag clouds for the titles of the 65,000 records contained in a single archive series. Tag clouds have justly been called the mullets of Web 2.0 (as far back as 2005—strange to think there was a Web 2.0 as long ago as 2005!) But Whitelaw’s use of tag clouds, helped along with plenty of Java, are the most intelligent use of tag clouds I’ve seen in a while.
One very handy piece of interactivity is that you can select to ignore particular tags which are crowding out their peers in the cloud; if in a particular archive half the titles contain “Naturalisation” or “Citizenship”, then in most contexts those words will have no more interest for you as a researcher than instances of “the” or “and”: they become stop-words in the context of that archive. Choosing to eliminate those recurring words reveal the real diversity of topics in the archive, startlingly. The effect is like putting on glasses: fuzzy tags on the periphery of the tag cloud, blotted out by one or two stop words, suddenly come into focus.
But the critical distinction in what Whitelaw does is that it can explore collocations of words in the titles. Clicking a tag draws lines to all the other tags it coocurs with in the same title—the more frequently, the thicker the line. So you can straightforwardly get a sense of what contexts a particular word comes up in—with an accompanying bar chart giving the chronological distribution of those contexts. As Whitelaw shows in his example, clicking on Darwin in his example archive draws prominent lines to “1937″ and “cyclone”—which burbles up out of the data the fact of the 1937 Darwin Cyclone [PDF]. The visualisation allows the user to drill down to digitisations of the individual archive records that the tag cloud collocations expose. (The metadata to the 1937 cyclone archives are online.)
Which all means that intelligent navigation of tags and tag collocations can expose stories directly in the documents they are drawn from, without any prepping or mediation. All done with a highly engaging interface.
Whitelaw has blogged his visualisation work at The Visible Archive, which includes downloadable Java for both visualisations (with canned data). The Darwin 1937 Cyclone is not the only fact that emerges out of the tag clouds, and we encourage you to go exploring yourselves.
Building e-Humanities infrastructure
Reflections on e-Humanities workshop, Melbourne e-Research Scholarship Centre, 2009-08-12
Building generic ICT infrastructure to support humanities research seems to be a difficult task. The standard approach is to
- collect a bunch of usage stories from different communities
- infer common business processes based on those stories
- build infrastructure that supports those business processes
The theory is that a community would then take the generic infrastructure and customise it to meet their particular needs. The problem is that there is something about the humanities that makes generic business processes hard to find.
We’ve blogged previously about the Project Bamboo approach to finding generic e-Humanities business processes. Project Bamboo certainly had difficulty converting its scholarly narratives into common recipes. Maybe there aren’t any processes common to the different strands of humanities research? Unlikely. Rather, the fierce independence of humanities researchers makes it difficult to infer commonalities. Suggesting to a humanities researcher that she might have a research process in common with her peers carries with it an inference that her research is not unique. Even uttering the phrase “business process” can put humanities researchers offside (some of them conflate business and commerce).
In this context, there was a little nervousness leading up to the Interconnections and Services in the eHumanities: Reflecting on Current Initiatives workshop hosted by the University of Melbourne eScholarship Research Centre on 12 August.
Read the rest of this entry »


