ADL Registries and Repositories Summit: report
The U.S. Advanced Distributed Learning Initiative (ADL) recently convened a Learning Content Registries and Repositories summit (#ADLRR2010) in Alexandria, Va., which Link Affiliates attended. (We have already posted here our position paper for the meeting.)
ADL have been pioneers in developing and disseminating e-learning content; the ADL-Registry and its underlying model CORDRA have been highly influential since their inception in 2003. However the way information is disseminated and consumed online has changed greatly in the six years since, and the expectations of users have changed along with them. The summit was convened to ask:
- What has happened in the last 6+ years?
- What are the current business drivers and requirements?
- What is the state of practice in registries and repositories for learning content?
- What are the outstanding business and policy issues?
- What are the outstanding technical issues?
- What should we (the broader learning, educational, training, repositories and registries communities) be doing?
The summit was arranged as a sequence of panels, with audience questions. The panels reflected perspectives from US Government agencies, repository initiatives, technical interoperability, Web 2.0 and Semantic Web, and content vendors. The summit also included two breakout sessions, on what the current status and problems are in the learning repository space, and on what future priorities for development should be.
I’ve taken blow by blow notes of the workshop at the Interoppo Research blog; ADL has also provided links to other blog posts and tweets discussing the summit, as well as position papers requested for the summit. The summit ended with a polyphony of opinions on what to do next. Looking back, however, there are some clear realisations running through the summit; these have been picked up by Dan Rehak and Damon Regan in their summaries (Rehak: PPT, Regan: PDF), and are consistent with the findings of the subsequent CETISROW event (see Phil Barker’s summary).
This is my own skewed summary of what the summit found:
- We don’t need more standards.
- We do need a lot to seek out much more feedback from our users: what problems are we trying to solve?
- The users don’t come to us, they go to Google (Facebook, Twitter, Flickr).
- We won’t beat Google (Facebook, Twitter, Flickr) at their own game, and should not try to.
- They build on Open Web content, we should provide Open Web content.
- They harness content through Open Web standards (as does the Semantic Web): we should expose content through Open Web standards.
- They set user expectations on discovery; we should break those expectations only if what we do is visibly better.
- We have unique value as repositories, as authoritative & targeted providers of content. We should promote this—via Open Web channels.
- We have defined contexts for interacting with content, and means of gathering user contextual data. That contributes to our unique value: better targeted search, or content push anticipating search.
- Get metadata from wherever you can (automated, user-provided): users already deal with bad metadata every day, and bad metadata is still better than no metadata.
- Repository federations are growing, but depend on harmonisation and registry metadata (and still coexist with Google).
The following is a more detailed summary.
What isn’t working
US government agencies are committed to transparent government and openly available content: they want their resources used more broadly and more freely. So there is a political driver behind making resource discovery more effective. This was reflected in the summit: it included high level participation from the Department of Education (in the context of the National Education Technology Plan), the Department of Energy, the National Science Foundation, and others.
Repositories are doing what they were built to do: they store and serve up content to their immediate constituencies through purpose-built portals. But as one breakout group put it: “Are repositories solving yesterday’s problems?” There is a growing feeling that repository content is not being discovered effectively more broadly. The first of three problems is, there are real difficulties in discovering content across different portals, and many portals do not interoperate with repositories.
Participants agree that this problem is not going to be addressed by more modelling or more standards: they were emphatic on “no new standards”. (Some went as far as “no new application profiles”: they are almost as dangerous for interoperability as different standards are, and are burdensome to maintain.) The issue is how to get those standards and models understood and implemented in the repository community, much of which is still silo’d.
The second, more grave problem is: repositories and registries are not being taken up by organisations. Repositories are a lot of overhead to get into; federations of repositories even more so, with protracted negotiation over common policy. Not all organisations see a clear short-term gain in going to the effort of creating metadata, or deciding on the resource granularity to expose for reuse, which repositories require.
A third problem, greater still, is related to this: repositories do not disseminate information the way users now expect to consume it. To get more effective discovery, we need to understand what users actually want from repositories. Unlike what one would expect in software development, the focus has stayed too long on technology instead of actual user requirements. That means the broader user community, not just specialist and power users: repositories no longer target only a specialist audience.
Ben Graff of K12 Inc has a good summary (PPT) of what users—educators and students—expect of repositories in e-learning:
| Educators | Students |
|---|---|
|
|
| Everyone | |
Making things openUser requirements of repositories are coloured by users’ experience of discovery online. In fact requirements can be gathered more usefully by observing users than by surveying them, because users’ practices are largely unreflected on. Users now consume information through the Open Web, and Web 2.0-driven interfaces built on the Open Web: that is as true for specialists as it is for general users. Users expect
This has led the US government (among others) to wonder, why getting their content to users can’t be as simple as what Amazon, iTunes and Google do. Administrators in the ADL wonder why they can’t harness Facebook to push learning content out to military personnel. Users have come to expect this level of ease of discovery, because of open, commonly used and simple—indeed, simplistic—content models and standards. The Semantic Web and Linked Data are not currently as straightforward for users to engage with, yet they too are based on the same common and simple models. The Semantic Web advances the agenda of shared understanding of metadata and standards, rather than mere formal compliance. This enhances discovery, by allowing data to be aggregated from more disparate sources. These approaches are maturing quickly—although they have gathered more research attention in Europe than the US; and repositories need to be prepared to engage with them. The appropriate response to these new user expectations is to use social networking and Web 2.0 technologies to drive users to the content. The response is not to build a duplicate of Facebook (or Twitter or RSS) for repositories: it is to open repositories to the existing tools that users already inhabit. The response is is not to ignore the informal ways the Web organises information (folksonomies, Wikipedia), but to embrace them as enabling technologies. As one breakout group wryly noted, the standards and repository communities does not have the resources or the influence to sway Google or the major vendors anyway, and gain mindshare with users: standards are left wagging the tail of the dog. The repository worldThe Open Web has prospered because it is simple. The repository world, by contrast with the Open Web, has more complex and rigorous models and standards, appropriate to its more specialised functions. Yet this complexity is getting in the way of users accessing their content, and (as Sarah Currier has argued) of users quickly getting their own content up. Complexity has also been a problem with repository software, which contrasts unfavourably with Apache-in-a-box: repository software needs to be much simpler to deploy, to drive uptake. (Cf. the division of labour between DSpace and Fedora.) DuraSpace acknowledges it was a mistake to work on perfecting the Fedora infrastructure for years before putting out a user-oriented application. Web 2.0 technologies have lowered the barrier for users to contribute content; so repositories are now the hard way of getting content up. If education is going to set up alternate structures like repositories, it needs to keep asking itself “what is so special about education”. The question becomes even more compelling as repositories start to seek integration with social networking tools—which puts their content on the same playing field as the rest of the Web. There are good answers to that question, as our position paper outlined, and they translate into user requirements. But it is those requirements that should be driving repositories, rather than organisational inertia—or even worse, the “Build It And They Will Come” illusion, that a repository is worthwhile even if a user community doesn’t find value in it. Repositories have to demonstrate a return on investment, to all stakeholders—funding agencies as well as students, vendors as well as educators. While repositories have good reason for their more complex models and protocols, they are also now expected to make their content broadly available. That means exposing their content through the technologies the rest of the Web uses, and not ghettoising themselves in the Dark Web. The repository community is realising that the Web, unruly as it is, is nonetheless now the primary knowledge environment: repositories should not be positioning themselves in competition with it. The repository world has seen repositories as destinations for users, and portals as one-stop shops; that is not how users see it any longer, and repositories are coming to accept that. The realisation that repositories have to fit into the Web has been slow, and has been translated into change even slower. A surprising number of repositories still do not use common web standards, such as RSS feeds and Sitemaps, to expose their content to users outside their portals. The same holds for identifiers: whatever the disadvantages of HTTP URIs as persistent identifiers, or as conflations of service and identity, they are the fabric of the web, and are how content is integrated into the web. HTTP URIs take a resource rather than a service view of identity; repositories using URIs have to follow suit in how they expose their services. But here too, it is important for repositories to ask themselves why they want to open their content up, and what sort of interoperability they want to see. Too often standards compliance is treated as a checkbox by organisations acquiring repositories, without a real business understanding of what operability or interoperability the repository should be realising. As a result, each client that vendors deal with ends up asking for their own standard to be implemented, without always having good business reasons for the inconsistency—which makes the problems standards are supposed to solve worse. Repositories are now starting to engage with the Semantic Web (notably Duraspace, and cf. OAI ORE): repositories are starting to understand their content as graphs of interrelated resources. Both repositories and the Semantic Web are driven by the need to curate data, and are contrasted with the Open Web, so this is a natural convergence even if they have been driven by different protocols. E-research is also expanding the limits of the kinds of data and relationships that repositories need to deal with. Repository FederationsRepository federations continue to be promoted as improving discovery across repositories, while preserving consistency and authority. The necessary technologies to make federations work are already in place: the one remaining component, being explored by several initiatives (GRI, ASPECT), are registries of repositories, to enable repository discovery and to autoconfigure service access to the repositories. Repository discovery is relevant not just to end users, but to organisations—which, it was noted, often don’t know what their own repositories contain. And autoconfigurable services, which we argued is key to the notion of the ad hoc federation, is also in line with the trend noted towards abstract search protocols, applicable across repository platforms. Federations have problems scaling (especially if multiple languages are involved, as in GLOBE/Ariadne): they need to improve discovery in line with user expectations, such as dealing with duplication and broken links; and their users still expect them to cover material outside the formal scope of the federation, such as Slideshare and iTunes University. So even when users come to federation portals, they expect discovery like what they get from Google: results ranked by relevance, simple interfaces, and covering the Open Web as well as the federation. We argued in our position paper that users will only use repositories if they are compellingly better than Google; this also shows they expect repositories will not be compellingly worse. There is a particular challenge in federating repositories with semantic interoperability. Domain-specific repositories fit their metadata and vocabularies to their disciplines—indeed, their users expect them to. This makes federating content across domains much more difficult, because they represent different views of the world. Only simplified metadata can span across the domains, and make the content accessible to general users; however, specialist users still expect to use their domain metadata in their discovery. This calls for different views of the metadata by different communities (as Rice University’s Connexions platform is exploring with its “lenses”), and a balance between systems interoperability and semantic interoperability. Sloppier search, and No searchOne of the compelling advantages of repositories we argued for in our position paper is that they are better attuned to the user context: they offer metadata specific to what the user is currently doing (as opposed to what they are doing generically online), and they align content to domain structures, such as curricula and competencies. It follows that learning repositories can gather information about what the user is doing in the learning context, to work out the relevance of search results better. This is something like what Google already does with search history. In the absence of high quality, manually coded metadata, any information that can be gleaned about the resource can be used as metadata, and for some purposes it can be more useful than manually coded metadata. We discuss search metadata here, but the National Digital Science Library Stem Exchange has been doing important work (PPT), automatically gathering feeds of “paradata” about how learning resources are being used by teachers, to supplement other user-provided information like recommendations. (The paradata feeds will be akin to hashtags in their informality, and much of them will be gathered without user involvement.) This information will be fed back to the repository, supplementing formal metadata in order to inform teachers better about how they too can use the resources. The drive to more user context metadata leads to two opposite outcomes. In the absence of manually coded metadata, exact search may not be practical: the user may not be able to key in the report reference number or the sponsoring organisation, and get back a highly precise set of results. But users familiar with Google no longer expect exact search anyway. What they do expect is good search ranking, with the results they would get from an exact search closest to the top. If anything, however, they welcome the serendipity of related content coming up in searches, particularly if they are working on developing new content. Such results are achieved by Google without hand-coded metadata; the metadata already available to the repository can only enhance that kind of “sloppy search”, improving its ranking. That metadata includes the user’s previous search activity, as user context informing the search. But “sloppy search” still presupposes a user actively seeking out and sifting through content. That makes sense for a content developer, for example; but a learner is accessing content through a quite different process, which is more tightly defined, and which can be second-guessed far more effectively. Ultimately, repositories should have enough contextual information on the user and what they are up to, that the user should not have to search for content at all: the repository should be able to anticipate what content is most relevant to the user at that point in their learning, and push it out to them. From the viewpoint of paedagogy, bypassing search is not unfamiliar: it is going back to the notion of set texts in schools. In the context of recommended context, it is what Amazon and Facebook already do as well. But as the breakout group that I was in blue-skied it, it has sweeping implications for how learners may experience repositories: user context is invaluable both to commercial and education providers, and there is room for such information to be brokered between different providers. This has huge security and privacy implications of course. (It sounds like an ’80s science fiction novel—but then, so does much of our lives online now.) However the trend has consistently been for user convenience to outweigh user privacy, and this will be a great improvement in user convenience: the user no longer has to do a search for content at all, and sift through irrelevant results. Given that students are already in the more structured environment of formal education, they do not expect to have to do searches for content anyway. What else needs fixing?The first breakout session was asked to work out what the current problems are with repositories; the second, how best to spend $1m (or $10m) to deal with those problems. While the summary above gives a particular narrative about that, the groups identified a range of other blockers that need to be dealt with:
|
|



Social comments and analytics for this post…
This post was mentioned on Twitter by mrch0mp3rs: Very well-articulated summary > RT @danielrehak: Nick Nicholas has posted his summary thoughts for #ADLRR10 http://tinyurl.com/25g5de9...
uberVU - social comments
April 28, 2010 at 2:12 am