Linking research & learning technologies through standards

Link Affiliates Blog

Author Archive

Metadata-less ANDS

leave a comment »

Guest blog post by Lyle Winton, VERSI

I’ve seen quite a few ARDC (Australian Research Data Commons) ideas that will use existing digital records to create a nice metadata-full context around research datasets.  Many of these records will have to be “cleaned up” or involve new processes to ensure more complete metadata.  Having worked as a researcher, I realise institutions collect bits of this stuff already – people, grant, publication info – but there’s still a lot of activities and projects which probably don’t have corporate records.  So I fear the convergence of the metadata-full approach and normal research practice will be more reporting and/or more metadata entry for researchers. 

This leads me to an idea (still half baked) and it’s based on 2 premises: ARDC is essentially about good discovery, not necessarily good metadata; and heavy reliance on manual entry of metadata is either expensive or patchy.  (Feel free to disagree with my premises.) 

Somewhat following the Google approach of “linking text” being more important than metadata: at the time of dataset registration you could “link” (essentially attach) as much unstructured text around the dataset as possible.

A scenario: Joanne Bloggs registers a numerical dataset from a research survey.  In the process she attaches an email thread between herself and the data collectors, a grant application that’s in progress, and several loosely related papers in PDF and Word formats.  Provided these “attachments” are private and only used for text based searches (eg. free text search, semantic network analysis) the files you upload, how many, the structure, and possibly even exact relevance all wouldn’t matter so much.  Let your (Google-like) search engine figure it out.

I think this addresses the issue of the time-poor researcher who doesn’t want to enter metadata, with a dataset that isn’t self descriptive, who doesn’t mind dumping a few files they have lying around their desktop into a private area.  So I could foresee two types of records in the ARDC, one is a curated record with structured metadata around valuable research datasets (the usual thinking), and the other is essentially a title plus a link or “contact Joanne Bloggs” message that can still be easily and effectively discovered through an associated (but hidden) text cloud.  Would people use that?

Written by lylewinton

November 7, 2009 at 6:10 pm

Follow

Get every new post delivered to your Inbox.