Position Paper: ADL Learning Content Registries and Repositories Summit
Link Affiliates will be participating in the ADL Learning Content Registries and Repositories Summit, to be held in Washington DC on April 13-14 2010:
There have been numerous learning content registry and repository projects. This summit aims to bring together participants to determine “where are we” and “what’s next” for learning content registries and repositories, dealing with business, policy and technical issues. The summit is targeted to those who develop, deploy or use registries and repositories to manage and deliver learning content along with users who develop and publish learning content or want to find it.
Rather than just submitting a position paper to the summit, we thought we would share our thoughts here on some of the trends we see happening in repositories and repository federation: the Googlification of repositories, open interfaces, repository mandate and user needs, and registry-of-registry approaches to repository federation.
Our repository exerpience
Link Affiliates (as readers of this blog know) is a group working on promoting and facilitating the adoption of standards in e-learning and e-research in Australia. Link Affiliates has undertaken several projects and collaborations over the last few years relevant to registries and repositories; these include:
- The Federated Repositories in Education (FRED) project (2007), establishing requirements and priority service specifications for CORDRA-style repository federations in Australia.
- The Persistent Identifiers Linking Infrastructure (PILIN) project (2007-2008), establishing requirements, specifications, and prototypes for persistent identifier services.
- The Learning Content Exchange & Discovery activity within the Technical Standards in Education project, which is investigating standards and opportunities for learning content repositories and registries in the Australian K-12 school sector.
- Participation in the IMS Learning Object Discovery and Exchange initiative, establishing a reference model and standards for description and discovery of learning object repositories, and of learning objects themselves through those repositories.
- Helping facilitate the Global Registries Initiative, looking to establish registries of registries of research data and publications.
- Representation on the advisory group for LORN, a learning object repository federation for the vocational education sector in Australia.
What works
User Interfaces
Repository-based search for resources continues to be vital in education, as with other sectors: repositories fill a niche of providing high quality, vetted resources—in contrast to the Open Web. However, Googling the Open Web is overwhelmingly easy, and so is the default way to discover resources in most sectors. For users to choose to use repositories instead of the Open Web, repositories must be compellingly better: they must do a lot more for the user than Google does, with a better fit to their domain-specific needs. Repositories have to keep asking themselves: “Why not Google?”
In e-Learning, the compelling case for repositories is made by a combination of the following, and repositories should be ensuring that they do a visibly better job than Google search for these requirements:
- Richer searches than Google’s
- Higher quality metadata
- Align discovery to the domain’s conceptual structures—for e-learning, this means discovery based on the curriculum, competencies, year level, etc.
- Make licensing of resources explicit, to enable and encourage reuse
- Expose the complex relations between resources: FRBR relations (versions, formats, copies), resources with related subject matter (by curriculum coverage, by authorship, by audience)
- Provide social context around resources (recommendations, user comments, ratings)
- Provide explicit authority for the resources, indicating who is responsible for the content and when it was last updated: this gives users assurance about resource quality, relevance, and where to give feedback, which can be obscured if content is redistributed
- Guarantee persistent storage and curation
Many of these benefits can be illustrated by the repository federation LORN, which we already mentioned. This is a screen capture of a LORN search for “hairdressing”:
The search provides Details, Preview, and Download views of resources, which are different presentations (Manifestations) of the resource. Licensing arrangements are explicit (AEShareNet codes), as are the creators and distributors of the resources (their authority). The relations between resources and the conceptual structures of the domain are exposed as facets, in the left panel (competencies satisfied, licences, types of resource, etc.)
The interface’s use of facets to expose metadata—as seen on eBay—acknowledges an important trend: people’s experience of discovery in repositories should be consistent with what they are already familiar with from the Open Web, rather than forcing unfamiliar interfaces on them. Repository metadata may be of better quality and more explicitly defined than the folksonomies of YouTube and Flickr—but they are most usable if they are presented to end users in the same way. Metadata structures should be revealed to users subtly rather than foregrounded: the metadata, after all, is only a means to the end of better discovery. LORN and federations like it are already putting these practices into effect.
The Google search lesson also extends to the initial discovery query. The LORN results shown were retrieved by a default keyword search for “hairdressing”, rather than targeting specific metadata fields. Over the past decade, library systems have been slowly transitioning from field-specific searches, harnessing explicit metadata, to Google-inspired general keyword searches. There are users who still want rich metadata search, and there are circumstances where such search is needed. But for most searches it is sufficient to provide keyword search with good results ranking and faceted reveal of further metadata. Once people have access to Simple Search, surveys have shown that they rarely (though not never) find they need to go to Advanced Search. This reflects people’s general experience of search interfaces. (For the National Library of Australia’s transition to Simple Search, see Evaluating the public library portal: Report, 2005.)
What doesn’t work
Solutions in search of problems
Repository federations are successful when they emerge in response to a clear community need, and have an explicit mandate to address that need. We believe LORN is successful because those conditions obtain—and have driven collaboration between State-based repositories in Australia to make the federation work.
But a mandate on its own is not enough: repositories again need to demonstrate they fill a compelling need, which is an organisational rather than a technical challenge.
University repositories have long struggled to recruit content, for example, and resort to graduate students as a captive source of content—not because the repositories lack the mandate, but because content providers (academics) don’t see a compelling need: disciplinary repositories like arXiv, and departmental web pages, already fulfil their immediate needs of content dissemination. arXiv arguably fulfils many of the other repository requirements for the domains it covers; and unlike a single university repository, it has a critical mass of content drawn globally. The university repository has to make quite an argument to persuade the academic to use it instead.
That means repositories compete to register the same content, which leads not to more registration, but less. Rather than compete, registries need to look to leverage each other’s holdings, whether through formal agreement or open access: they need to be harvesting content from each other more effectively.
To give another example of a mismatch between repositories and users: the Monash University institutional repository registers crystallography datasets produced at the university. The institutional repository has generic Dublin Core metadata, befitting its target audience of library users. That metadata however is not useful to the core audience of the dataset, crystallographers—who are more interested in diffractometer types than resource types. As a result, the crystallographers have stored their content in a different repository federation, which supports their more complex metadata requirements. Again: if the metadata profile supported by the repository does not meet a compelling need of its users, they will take their data elsewhere.
Repositories ultimately are an infrastructure service, and will only be used if they match their users’ needs and expectations—like any other service.
What is needed
Opening up repositories
Realistically, even if repositories emulate Google, users are still likelier to discover resources through Google, or through their own community portals, than by coming to the repository directly. Users don’t just expect the repository to feel like Google (or eBay, or increasingly Wikipedia); they expect it to be accessible to Google (and Wikipedia). The expectation is stronger in the research sector, with its culture of open access; in the institutional education sector, learning resources are typically consumed within a more controlled context. But the culture shift affects all repositories, as it is driven by their Googling users. Repositories have to keep telling themselves: “They do Google.”
Repositories that want their material openly accessible still have work ahead of them. Repositories are still a major chunk of the Dark Web, that search engines cannot get to. Contributors to repositories usually expect their content to be discoverable. Even if the content itself is not publicly accessible, they expect its metadata to be, so that outside users can decide whether to arrange for access. (This includes the model fitfully applied to copyright print publications by Google Books and Amazon.)
So resources or their metadata should be straightforwardly accessible through common protocols. “Common” means “web-wide”, not sector-specific: Google’s refusal to support OAI-PMH is a salutary lesson. Although OAI-PMH is clearly working well within the repository domain (e.g. LORN is built on OAI-PMH), it is limited to that domain: as far as Google is concerned, it is trapping repository content in the Dark Web. So while they should continue to use protocols like OAI-PMH, repositories also need to expose their content the way the Web now expects. That means embracing protocols like RSS, ATOM, and OpenSearch, and exposing metadata records as URIs which web crawlers can discover and navigate.
Users and projects increasingly expect resources to be straightforwardly linkable as well, through static HTTP URIs that are compatible with Linked Data and the REST view of the world. Linked Data allows third parties to enhance resources, providing their own structures on top of them. By insisting on static URIs for all resources, Linked Data and REST provide a model of content far more amenable to navigation by web crawlers: the benefits of static URIs are not restricted to the Semantic Web. In fact, the OAI-ORE project shows that repositories themselves can capitalise on Linked Data approaches to resources, to build flexible models of digital object aggregation.
- For a not very Semantic-Web approach to resource URIs, see for example how the IMDB identifier for Clash of the Titans, http://www.imdb.com/title/tt0800320/, has already been used in 13500 web pages simply as embedded text—facilitating everything from resource discovery to tweets to YouTube reviews to movie piracy.
- There has been considerable debate on whether “Linked Data” requires RDF, and what to call approaches that don’t follow all of Tim Berners Lee’s “personal view” on Linked Data; Web of Data and small-l “linked data” have been suggested for at least the RDF-free variant of Linked Data™. The IMDB example is probably not what TBL had in mind; but as the Linked Data community agree, getting data granularly identified and openly accessible is a prerequisite for anything else to be done. Repositories can still provide vital infrastructure for others to build on, without having to commit to RDF for themselves.
Repository description
If we don’t want to confine ourselves to just one repository to find content, and we also don’t assume that everything is on Google (or that Google is the most effective way to find content), then we need to be able to find other repositories, which may have the kind of content we need. Even if we have access to lots of repositories through a federation, we may still need to gain access to other federations. That means we need to discover not just content, but repositories likely to contain relevant content, and registries likely to provide access to those repositories. To work out which repositories and registries we should be targeting, we need searchable metadata describing them.
In addition, once we have discovered the repository or registry of interest, we need to arrange access and start searching it. So before we do, we need to know what services can be used to access the repository—and we may even be able to start using those services directly from our computer. That means that the metadata we would like to search through should describe not only what is in the repositories, but how to gain access to them, and what their service interfaces are.
Several projects, such as ANDS, GRI and IMS LODE—and ADL—are exploring registry-of-registries approaches to large-scale repository federation. Such a central registry-of-registries provides infrastructure for formal repository federations to be formed. But more than that, it allows ad hoc federations to be formed. Users can select which range of repositories they want to interact with (provided they have access to them, either through Open Access or a trust federation): users can then initiate discovery across that range, without having to wait for a formal federation agreement.
If the discovery does wait for a formal agreement, the harvest-driven model of CORDRA is clearly superior, as ADL has clearly argued already: centralised registry search in a federation gives better consistency, authority, and efficiency. But that is only one context for discovery, and a role can still be played by federated search—whether because a user wants to form an ad hoc federation for their one-off search, or to do rapid prototyping of a formal federation. Federated search remains an alternate model of discovery which works better in some contexts—just as discovery through Google does.
Registry-of-registry approaches provide more flexible discovery, and a wider catchment for discovery. This vision was realised quite early on in the OpenDOAR project, which exploited Google search as the one service infrastructure common to many repositories, and Google as a default centralised registry-of-registries. Repositories can do better than Google search—even if they hide that from their users; but coordination across repositories is only possible if the repositories adopt common protocol standards. These protocols may be the repositories’ own OAI-PMH and SRU, or the web-wide RSS and OpenSearch; and the discovery may be federated or registry-based.
In either case, as we’ve argued, registries-of-registries require metadata capturing repository scope and coverage, to discover the relevant repositories, and metadata describing their service interfaces, to draw them into the federation. It should be possible to autoconfigure access to the repository services, given the registry-of-registries metadata; and autoconfiguration of services is critical to the work GRI and IMS LODE are undertaking. Within IMS LODE we have already demonstrated that it is possible to use a repository description to autoconfigure a registry to harvest and search that repository.
To make all this happen, then, registries-of-registries need a model of repositories, to formulate metadata on. ANDS, GRI, and IMS LODE have independently drawn on the work of ISO 2146:2010, to form their own descriptions as profiles of that model. (ANDS: RIF-CS; GRI: RIF-CS and LibraryFind pilots; LODE: LODE Draft Base Document.) We have documented our conceptual work on ISO 2146 for LODE in an earlier blog post.
However, ISO 2146 is a conceptual model by design, rather than a schema, and in fact has only been published last week. The number of initiatives using the model months before it was published suggests that the repository community needs to drive a coherent solution to this emerging requirement. This could mean the community formulating something more explicit than a conceptual model, as a common schema (and making it more openly available).
Guarantees
Linked Data relies on persistent identifiers, as does one of the most critical use cases of repositories: persistent access to the resource. But persistence is as much about perception and badging as it is about actual persistence: any identifier that breaks brings the identifier infrastructure into disrepute. The ARK primitive operator ??, returning a “commitment statement” on how long the identifier is guaranteed to be persisted, is a great idea that is not being realised anywhere—not even in ARK deployments.
More generally than that, repositories draw much of their reason for existence from their authority and curation of data; users need to have guarantees of trustworthiness that are more explicit than the URI domain name—especially once repositories are seen as information infrastructure, as has been the vision all along.
Summary
To summarise:
- The context in which information is discovered and consumed is very different from ten years ago, and so are the associated expectation, particularly from learners.
- Repositories have to keep demonstrating their relevance to users, especially in light of how discovering and consuming information has changed: repositories have to be clearly doing what users want better than Google does, for their domain.
- Repositories must not wall themselves up from the changes in how information is exposed — Web 2.0, REST notions of static resource URIs, Linked Data, generic open standards.
- Research data—and Learning Object metadata—should not be locked away in the Dark Web; this misses opportunities for exposure and reuse.
- The users’ world is Googlified; the repository user experience should be closer to the user’s general context, both in initial search (keyword default), and in how metadata is navigated (facet exposure).
- Repository description, including description of autoconfigurable service interfaces, allows more flexible and extensible models of repository federation.
- Repositories are a service, and need to maintain reliability as a service.




[...] This post was mentioned on Twitter by IT Blog Network, Semantic Web Blogs. Semantic Web Blogs said: #Semantic #Blogs Position Paper: ADL Learning Content Registries and Repositories …: By insisting on static URIs… http://bit.ly/a6NShH [...]
Tweets that mention Position Paper: ADL Learning Content Registries and Repositories Summit « Linking research & learning technologies through standards -- Topsy.com
April 7, 2010 at 11:15 pm
Nick- thanks for bringing out a balanced discussion on closed repository type searches Vs ‘Google’. Both have to acknowledge the other. While school systems are working towards repository solutions, the majority of teachers still just ‘Google’. Some solutions such as NSW DETs TaLe at least acknowledges the need for harvesting these closed systems, but falls short of being able to search beyond identified registeries. Bring on more open/integrated models of repository services.
thand
April 12, 2010 at 3:46 pm
[...] L’autre point intéressant que je tenais à capturer touche à cet ennemi public no. 1, si je puis dire, qui se trouvait incessamment agité comme un épouvantail devant les esprits effrayés de l’assistance: Repository-based search for resources continues to be vital in education, as with other sectors: rep… [...]
De la capitale américaine, en marge du sommet sur le nucléaire… « Blogue du GTA
April 17, 2010 at 4:13 am
[...] (#ADLRR2010) in Alexandria, Va., which Link Affiliates attended. (We have already posted here our position paper for the [...]
ADL Registries and Repositories Summit: report « Linking research & learning technologies through standards
April 27, 2010 at 9:25 pm
[...] Trust Federations in the VET sectorISO 2146 releasedADL Registries and Repositories Summit: reportPosition Paper: ADL Learning Content Registries and Repositories SummitNational Curriculum, machine-readableE-learning registry description through [...]
Learning Content Discovery & Exchange: Activity Summary « Linking research & learning technologies through standards
June 30, 2010 at 12:01 pm