Linking research & learning technologies through standards

Link Affiliates Blog

ADL Registries and Repositories Summit: report

with one comment

The U.S. Advanced Distributed Learning Initiative (ADL) recently convened a Learning Content Registries and Repositories summit (#ADLRR2010) in Alexandria, Va., which Link Affiliates attended. (We have already posted here our position paper for the meeting.)

ADL have been pioneers in developing and disseminating e-learning content; the ADL-Registry and its underlying model CORDRA have been highly influential since their inception in 2003. However the way information is disseminated and consumed online has changed greatly in the six years since, and the expectations of users have changed along with them. The summit was convened to ask:

  • What has happened in the last 6+ years?
  • What are the current business drivers and requirements?
  • What is the state of practice in registries and repositories for learning content?
  • What are the outstanding business and policy issues?
  • What are the outstanding technical issues?
  • What should we (the broader learning, educational, training, repositories and registries communities) be doing?

The summit was arranged as a sequence of panels, with audience questions. The panels reflected perspectives from US Government agencies, repository initiatives, technical interoperability, Web 2.0 and Semantic Web, and content vendors. The summit also included two breakout sessions, on what the current status and problems are in the learning repository space, and on what future priorities for development should be.

I’ve taken blow by blow notes of the workshop at the Interoppo Research blog; ADL has also provided links to other blog posts and tweets discussing the summit, as well as position papers requested for the summit. The summit ended with a polyphony of opinions on what to do next. Looking back, however, there are some clear realisations running through the summit; these have been picked up by Dan Rehak and Damon Regan in their summaries (Rehak: PPT, Regan: PDF), and are consistent with the findings of the subsequent CETISROW event (see Phil Barker’s summary).

This is my own skewed summary of what the summit found:

  • We don’t need more standards.
  • We do need a lot to seek out much more feedback from our users: what problems are we trying to solve?
  • The users don’t come to us, they go to Google (Facebook, Twitter, Flickr).
  • We won’t beat Google (Facebook, Twitter, Flickr) at their own game, and should not try to.
    • They build on Open Web content, we should provide Open Web content.
    • They harness content through Open Web standards (as does the Semantic Web): we should expose content through Open Web standards.
    • They set user expectations on discovery; we should break those expectations only if what we do is visibly better.
  • We have unique value as repositories, as authoritative & targeted providers of content. We should promote this—via Open Web channels.
  • We have defined contexts for interacting with content, and means of gathering user contextual data. That contributes to our unique value: better targeted search, or content push anticipating search.
  • Get metadata from wherever you can (automated, user-provided): users already deal with bad metadata every day, and bad metadata is still better than no metadata.
  • Repository federations are growing, but depend on harmonisation and registry metadata (and still coexist with Google).

The following is a more detailed summary.

What isn’t working

US government agencies are committed to transparent government and openly available content: they want their resources used more broadly and more freely. So there is a political driver behind making resource discovery more effective. This was reflected in the summit: it included high level participation from the Department of Education (in the context of the National Education Technology Plan), the Department of Energy, the National Science Foundation, and others.

Repositories are doing what they were built to do: they store and serve up content to their immediate constituencies through purpose-built portals. But as one breakout group put it: “Are repositories solving yesterday’s problems?” There is a growing feeling that repository content is not being discovered effectively more broadly. The first of three problems is, there are real difficulties in discovering content across different portals, and many portals do not interoperate with repositories.

Participants agree that this problem is not going to be addressed by more modelling or more standards: they were emphatic on “no new standards”. (Some went as far as “no new application profiles”: they are almost as dangerous for interoperability as different standards are, and are burdensome to maintain.) The issue is how to get those standards and models understood and implemented in the repository community, much of which is still silo’d.

The second, more grave problem is: repositories and registries are not being taken up by organisations. Repositories are a lot of overhead to get into; federations of repositories even more so, with protracted negotiation over common policy. Not all organisations see a clear short-term gain in going to the effort of creating metadata, or deciding on the resource granularity to expose for reuse, which repositories require.

A third problem, greater still, is related to this: repositories do not disseminate information the way users now expect to consume it. To get more effective discovery, we need to understand what users actually want from repositories. Unlike what one would expect in software development, the focus has stayed too long on technology instead of actual user requirements. That means the broader user community, not just specialist and power users: repositories no longer target only a specialist audience.

Ben Graff of K12 Inc has a good summary (PPT) of what users—educators and students—expect of repositories in e-learning:

Educators Students
  • Applicability: I need content at the right size for the right context
  • Discoverability: I want to find the right thing quickly
  • Utility: I want things that I can make work in my environment
  • Community: I want things that my peers can recommend and that they will respect
  • Satisfaction: I want to feel like I’m getting the best available resources
  • Quality: I want things that are proven, authoritative, and innovative
  • Relevance: I want interesting and engaging experiences
  • Applicability: I want help finding the right thing for me right now.
  • (proviso: as a novice, I don’t know what I don’t know.)
Everyone
  • Simplicity: if it’s not easy, I’ll walk away

Making things open

User requirements of repositories are coloured by users’ experience of discovery online. In fact requirements can be gathered more usefully by observing users than by surveying them, because users’ practices are largely unreflected on.

Users now consume information through the Open Web, and Web 2.0-driven interfaces built on the Open Web: that is as true for specialists as it is for general users. Users expect

  • to interact with content (commenting as writers as well as readers);
  • to use social paradigms in accessing content (e.g. recommendation links);
  • and to use simple generic discovery portals (much more like Google than like most repository portals).

This has led the US government (among others) to wonder, why getting their content to users can’t be as simple as what Amazon, iTunes and Google do. Administrators in the ADL wonder why they can’t harness Facebook to push learning content out to military personnel. Users have come to expect this level of ease of discovery, because of open, commonly used and simple—indeed, simplistic—content models and standards.

The Semantic Web and Linked Data are not currently as straightforward for users to engage with, yet they too are based on the same common and simple models. The Semantic Web advances the agenda of shared understanding of metadata and standards, rather than mere formal compliance. This enhances discovery, by allowing data to be aggregated from more disparate sources. These approaches are maturing quickly—although they have gathered more research attention in Europe than the US; and repositories need to be prepared to engage with them.

The appropriate response to these new user expectations is to use social networking and Web 2.0 technologies to drive users to the content. The response is not to build a duplicate of Facebook (or Twitter or RSS) for repositories: it is to open repositories to the existing tools that users already inhabit. The response is is not to ignore the informal ways the Web organises information (folksonomies, Wikipedia), but to embrace them as enabling technologies. As one breakout group wryly noted, the standards and repository communities does not have the resources or the influence to sway Google or the major vendors anyway, and gain mindshare with users: standards are left wagging the tail of the dog.

The repository world

The Open Web has prospered because it is simple. The repository world, by contrast with the Open Web, has more complex and rigorous models and standards, appropriate to its more specialised functions.

Yet this complexity is getting in the way of users accessing their content, and (as Sarah Currier has argued) of users quickly getting their own content up. Complexity has also been a problem with repository software, which contrasts unfavourably with Apache-in-a-box: repository software needs to be much simpler to deploy, to drive uptake. (Cf. the division of labour between DSpace and Fedora.) DuraSpace acknowledges it was a mistake to work on perfecting the Fedora infrastructure for years before putting out a user-oriented application.

Web 2.0 technologies have lowered the barrier for users to contribute content; so repositories are now the hard way of getting content up. If education is going to set up alternate structures like repositories, it needs to keep asking itself “what is so special about education”. The question becomes even more compelling as repositories start to seek integration with social networking tools—which puts their content on the same playing field as the rest of the Web.

There are good answers to that question, as our position paper outlined, and they translate into user requirements. But it is those requirements that should be driving repositories, rather than organisational inertia—or even worse, the “Build It And They Will Come” illusion, that a repository is worthwhile even if a user community doesn’t find value in it. Repositories have to demonstrate a return on investment, to all stakeholders—funding agencies as well as students, vendors as well as educators.

While repositories have good reason for their more complex models and protocols, they are also now expected to make their content broadly available. That means exposing their content through the technologies the rest of the Web uses, and not ghettoising themselves in the Dark Web. The repository community is realising that the Web, unruly as it is, is nonetheless now the primary knowledge environment: repositories should not be positioning themselves in competition with it. The repository world has seen repositories as destinations for users, and portals as one-stop shops; that is not how users see it any longer, and repositories are coming to accept that.

The realisation that repositories have to fit into the Web has been slow, and has been translated into change even slower. A surprising number of repositories still do not use common web standards, such as RSS feeds and Sitemaps, to expose their content to users outside their portals. The same holds for identifiers: whatever the disadvantages of HTTP URIs as persistent identifiers, or as conflations of service and identity, they are the fabric of the web, and are how content is integrated into the web. HTTP URIs take a resource rather than a service view of identity; repositories using URIs have to follow suit in how they expose their services.

But here too, it is important for repositories to ask themselves why they want to open their content up, and what sort of interoperability they want to see. Too often standards compliance is treated as a checkbox by organisations acquiring repositories, without a real business understanding of what operability or interoperability the repository should be realising. As a result, each client that vendors deal with ends up asking for their own standard to be implemented, without always having good business reasons for the inconsistency—which makes the problems standards are supposed to solve worse.

Repositories are now starting to engage with the Semantic Web (notably Duraspace, and cf. OAI ORE): repositories are starting to understand their content as graphs of interrelated resources. Both repositories and the Semantic Web are driven by the need to curate data, and are contrasted with the Open Web, so this is a natural convergence even if they have been driven by different protocols. E-research is also expanding the limits of the kinds of data and relationships that repositories need to deal with.

Repository Federations

Repository federations continue to be promoted as improving discovery across repositories, while preserving consistency and authority. The necessary technologies to make federations work are already in place: the one remaining component, being explored by several initiatives (GRI, ASPECT), are registries of repositories, to enable repository discovery and to autoconfigure service access to the repositories. Repository discovery is relevant not just to end users, but to organisations—which, it was noted, often don’t know what their own repositories contain. And autoconfigurable services, which we argued is key to the notion of the ad hoc federation, is also in line with the trend noted towards abstract search protocols, applicable across repository platforms.

Federations have problems scaling (especially if multiple languages are involved, as in GLOBE/Ariadne): they need to improve discovery in line with user expectations, such as dealing with duplication and broken links; and their users still expect them to cover material outside the formal scope of the federation, such as Slideshare and iTunes University. So even when users come to federation portals, they expect discovery like what they get from Google: results ranked by relevance, simple interfaces, and covering the Open Web as well as the federation. We argued in our position paper that users will only use repositories if they are compellingly better than Google; this also shows they expect repositories will not be compellingly worse.

There is a particular challenge in federating repositories with semantic interoperability. Domain-specific repositories fit their metadata and vocabularies to their disciplines—indeed, their users expect them to. This makes federating content across domains much more difficult, because they represent different views of the world. Only simplified metadata can span across the domains, and make the content accessible to general users; however, specialist users still expect to use their domain metadata in their discovery. This calls for different views of the metadata by different communities (as Rice University’s Connexions platform is exploring with its “lenses”), and a balance between systems interoperability and semantic interoperability.

Sloppier search, and No search

One of the compelling advantages of repositories we argued for in our position paper is that they are better attuned to the user context: they offer metadata specific to what the user is currently doing (as opposed to what they are doing generically online), and they align content to domain structures, such as curricula and competencies. It follows that learning repositories can gather information about what the user is doing in the learning context, to work out the relevance of search results better. This is something like what Google already does with search history.

In the absence of high quality, manually coded metadata, any information that can be gleaned about the resource can be used as metadata, and for some purposes it can be more useful than manually coded metadata. We discuss search metadata here, but the National Digital Science Library Stem Exchange has been doing important work (PPT), automatically gathering feeds of “paradata” about how learning resources are being used by teachers, to supplement other user-provided information like recommendations. (The paradata feeds will be akin to hashtags in their informality, and much of them will be gathered without user involvement.) This information will be fed back to the repository, supplementing formal metadata in order to inform teachers better about how they too can use the resources.

The drive to more user context metadata leads to two opposite outcomes. In the absence of manually coded metadata, exact search may not be practical: the user may not be able to key in the report reference number or the sponsoring organisation, and get back a highly precise set of results. But users familiar with Google no longer expect exact search anyway. What they do expect is good search ranking, with the results they would get from an exact search closest to the top. If anything, however, they welcome the serendipity of related content coming up in searches, particularly if they are working on developing new content. Such results are achieved by Google without hand-coded metadata; the metadata already available to the repository can only enhance that kind of “sloppy search”, improving its ranking. That metadata includes the user’s previous search activity, as user context informing the search.

But “sloppy search” still presupposes a user actively seeking out and sifting through content. That makes sense for a content developer, for example; but a learner is accessing content through a quite different process, which is more tightly defined, and which can be second-guessed far more effectively. Ultimately, repositories should have enough contextual information on the user and what they are up to, that the user should not have to search for content at all: the repository should be able to anticipate what content is most relevant to the user at that point in their learning, and push it out to them.

From the viewpoint of paedagogy, bypassing search is not unfamiliar: it is going back to the notion of set texts in schools. In the context of recommended context, it is what Amazon and Facebook already do as well. But as the breakout group that I was in blue-skied it, it has sweeping implications for how learners may experience repositories: user context is invaluable both to commercial and education providers, and there is room for such information to be brokered between different providers.

This has huge security and privacy implications of course. (It sounds like an ’80s science fiction novel—but then, so does much of our lives online now.) However the trend has consistently been for user convenience to outweigh user privacy, and this will be a great improvement in user convenience: the user no longer has to do a search for content at all, and sift through irrelevant results. Given that students are already in the more structured environment of formal education, they do not expect to have to do searches for content anyway.

What else needs fixing?

The first breakout session was asked to work out what the current problems are with repositories; the second, how best to spend $1m (or $10m) to deal with those problems. While the summary above gives a particular narrative about that, the groups identified a range of other blockers that need to be dealt with:

  • Reuse in the repository domain is not as simple as hitting a button on iTunes: this needs real understanding of users and user needs, and of what the authority and mandate for reuse is. Reuse is very hard to get right “outside the schoolhouse”. In fact there are real incentives in organisations to duplicate rather than reuse content.
  • There are still cultural and organisational problems with opening up content and promoting reuse—not least the Not Invented Here attitude; access restrictions compound the problem. Repository workers have to recognise that not everyone will need to, want to, or be able to work together.
  • Metadata remains difficult to get people to write. Automated and user-contributed alternatives for metadata need to be considered: “bad” metadata is better than none (as users’ experience with the Open Web has taught them). Legacy data are also difficult to establish metadata for.
  • Repositories still need to work their way through identity management and trust, to enable single sign on and secure data transfer, and to have interfaces more clearly aligned to user roles. More generally, the repository world needs more middleware and web services to facilitate interoperability.
  • Repositories should provide direct connection to content authors, who can enable reuse at the point of authoring. Repositories can be embedded into the process of authoring content which is already destined for the repository: that can make many of the workflow problems with resource and metadata quality go away. (See ICE, REPOMMAN, and SWORD for attempts at such embedding: the embedding can be as lightweight as an ATOM widget.) In the case of the US military, external contractors are authoring content, without access to the repositories it will go on. The problems with that should be obvious.
  • Bottom-up approaches to repositories make it difficult to get shared vision.
  • There is still a lack of best-practice guidance, or answers to best-practice questions from the community.
  • Open Source is not a cure-all, nor a guarantee of sustainable innovation: it is not the only possible future. In fact the highest-profile recent technical innovation has come from the closed shop of Apple. Openness is a means, not a goal in itself.

    On that last point, see also this recent exchange at CETIS between Andy Powell and Phil Barker, if Phil will pardon my paraphrase:

    —Are you arguing that iTunes University is open?
    —No, I’m arguing that iTunes University gives users what they want—which is more important.

Advertisement

One Response

Subscribe to comments with RSS.

  1. Social comments and analytics for this post…

    This post was mentioned on Twitter by mrch0mp3rs: Very well-articulated summary > RT @danielrehak: Nick Nicholas has posted his summary thoughts for #ADLRR10 http://tinyurl.com/25g5de9...

    uberVU - social comments

    April 28, 2010 at 2:12 am


Leave a Reply

Fill in your details below or click an icon to log in:

Gravatar
WordPress.com Logo

You are commenting using your WordPress.com account. Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.