Linking research & learning technologies through standards

Link Affiliates Blog

Posts Tagged ‘data modelling

SIF Updates and Progress

leave a comment »

SIF Association AU recently held a two day workshop for the Data Standards Working Group, which has been working for the past couple of years on the Australian data model and specification for SIF. These are some of the highlights of the meeting:

New SIF Association US Standard

SIF Implementation Specification 2.4 is going to be released in early June; a preview of the features to be included is already available. (See also Larry Fruth’s presentation (PPT) at the recent IDEA10 event.) The new release of SIF features new objects and attributes, including improved coverage of assessment and its alignment to curricula, and objects to support special programmes for staff and students (student participation, professional development). But there are two major additions in this version taking SIF in new directions.
Read the rest of this entry »

Comparison, People Australia and Register My Data encoding of parties

with one comment

We have already presented the People Australia and the Register My Data initiatives, and their different approaches to encoding information about parties and their identity. We elsewhere walk through a comparison of their schemata, which consists of a walkthrough the schemata, and a discussion of points of disparity. We first compare People Australia with ISO 2146 proper, before comparing ISO 2146 with RIF-CS.

Our comparison is motivated by the fact that ANDS will be using People Australia as a primary resource for researcher identity. The comparison is specific to the process of importing People Australia metadata into the format required for Register My Data.
Read the rest of this entry »

Written by Nick Nicholas

December 10, 2009 at 5:04 pm

People Australia and Register My Data encoding of parties

leave a comment »

We have seen in a previous post that different representations of identity are possible, because there are different business motivations for knowing a party’s identity. Depending on the use we put the identity to, different kinds of detail need to be gathered about a party.

There are two major initiatives for identifying parties being considered at the moment in Australian e-research. Register My Data aims to improve the discovery of research data through the Australian Research Data Commons, and People Australia aims to improve the discovery of resources by and about people and organisations generally. The initiatives do not address exactly the same business concerns, so the metadata they gather are different.
Read the rest of this entry »

Written by Nick Nicholas

December 4, 2009 at 11:13 am

Modelling identity for different purposes

leave a comment »

Registries of data—whether in research, learning, government, or other domains, and whether repositories, data warehouses, Learning Management Systems, or libraries—typically contain metadata not just on the content itself, but on who the data came from. The people responsible for the data are of interest to the people consuming the data; so registries need to record information about them as well. The primary kind of people (or groups of people) that are of interest are the authors of the data—or, where that concept is not as applicable, the contributors or compilers of the data. (Because institutions and organisations can also claim authorship, we prefer to refer to parties rather than people, following the ISO 2146 information model for registries.) But many parties can be responsible for data ending up in a registry, in the form it does; a registry can track a range of parties involved with data, in a range of roles: publisher, editor, validator, annotator, designer.

Because it is important to record information about parties, lots of registries record that information, in lots of ways. And to lots varying extents of detail. That means that there are a variety of information models at play for parties in registries. That doesn’t mean that all information models are rigorous and well thought out. Whacking in just the login name of an uploader, as YouTube does, is itself an information model for a party involved with the content—even if the amount of thought that went into it was not overwhelming.

But that does not mean YouTube’s information model is wrong. How much information you capture on parties for a registry depends on what use that information will be put to in the registry. The information model for parties is driven by the business requirements of the registry.

That of course is no great surprise, and working out what information is required is not particularly onerous: people may not put a lot of thought into it when they put registries together, but often enough they don’t need to. Still, especially if you are shopping for standards on representing parties, it is worth spending a couple of minutes working out what you need—and as importantly, what you don’t need.
Read the rest of this entry »

Written by Nick Nicholas

November 16, 2009 at 10:05 am

Fluid identity in repositories

with one comment

The business of a library is to establish authoritative identities for the works they make available. That is why libraries put together authority files, as unambiguous names for authors: those are the names books are indexed under, and searched under in library catalogues. There are several advantages of having an unambiguous identity for an author are obvious. A researcher who wants credit for their work—or the department whose funding depends on it—doesn’t want credit to go to another researcher with the same name. Anyone collecting royalties on their published work will want their identity to be unambiguous as well—though not all fields of research make it as worthwhile to chase after residuals.

Library users also appreciate disambiguation: if I am looking for works by or about the contemporary German novelist Richard Wagner (1952- ), I’d like to avoid the deluge of works by or about the slightly more famous German composer Richard Wagner (1813-1883). And a library catalogue is being helpful when it includes the dates of birth to differentiate between the two Richard Wagners—just as Wikipedia is, when it refers to Richard_Wagner_(novelist).

Making those kinds of distinctions depends on having good enough metadata on the authors. If you’ve publishing a dead-tree book in the past few decades, your national library has been in cahoots with your publisher to make sure they have that metadata. *I* don’t remember giving the Library of Congress my year of birth, but it avoids a car dealer in Florida getting credit for any books I’ve written. (See Libraries Australia.)
Read the rest of this entry »

Written by Nick Nicholas

October 21, 2009 at 6:54 am

IMS LODE: Discovery through Collection Descriptions

with 2 comments

We have already discussed our development activities around the IMS LODE activity for discovery of learning objects. However, what we have described so far presupposes that learning object descriptions are already available to a user, because the user can access those descriptions in their local repository, or through a repository federation they have access to.

But there will not in the foreseeable future be a Super-Federation of all education repositories in the world, nor indeed does there need to be. Rather than unleashing users on all e-learning repositories in the world, it makes more sense for users to discover learning object collections that they don’t already have access to—but which are of direct interest to them. So users should be able to target their searches for content to the collections which will pay off, instead of doing an inefficient, iterative blanket search across Everything.
Read the rest of this entry »

Data Standards and Localisation for SIF-AU

leave a comment »

As in other sectors, schools have long been burdened with the incompatibility of the multiple IT systems used to run their business. The Learning Management Systems, the Student Enrolment Systems, the systems dealing with assessment, pastoral care, attendance, staffing, timetabling—all of these store data about the same students in different ways, and each exports data in its own way. To get the systems to share data between each other has often meant costly custom porting for each pair—where it has not involved printing the data out, and rekeying it from scratch. Waving printouts at a keyboard is not, of course, fulfilling the promise of the paperless office, and it hardly translates to data at one’s fingertips.

The school sector realised quite early (1998) that something could be done about this, and the Systems Interoperability Framework (SIF) was developed Stateside in response to it. SIF was developed several years before Service-Oriented Architecture started to address similar issues of data incompatibility in industry, but it takes an approach similar enough that it can be stated in SOA terms. Data is exchanged between systems across a common trust environment, using common data structures in XML—just like the Enterprise Service Bus and SOAP of SOA. Systems are able to exchange data because agents translate their native data to the common formats and back again. Data can be pulled in, in a request–response pattern, or pushed out, in a subscription pattern. Unlike SOA, the protocols and common data models are standardised and fixed ahead of time for the school domain, and do not require the systems to be reengineered to fit the system protocols better; so SIF can be layered over existing systems relatively straightforwardly.

The recent trend in the Australian government school sector, and to a growing extent in the Catholic sector, has been to host school systems centrally; this leads to greater efficiencies and security in how data is handled and exchanged, and relieves schools from the burden of having to run systems themselves. However, jurisdictions still have to deal with multiple systems internally, some more centralised than others. They also have occasion to exchange data with other jurisdictions and schools, especially when students move interstate, or in dealing with national testing and benchmarking. Dealing with these issues has made SIF an attractive proposition for the Australian school sector, whether to support the integration of all their internal systems, as is taking place in Victoria, or to provide a consistent outward interface to the data they are authorised to share. This has led to the SIF Association AU initiative, led by representatives from all Australian school systems.

As a relatively lightweight technical architecture, SIF does not particularly depend on where it is deployed, and there have already been several successful deployments of SIF in the UK, in addition to the pilots now underway in Australia. What does need to change from place to place is the model underlying the common data format that the system uses. The original SIF data model deals with the realities of the American school system, so it represents data that makes sense for that context. Because students are provided lunch at school, the logistics of school canteens are a major concern of SIF data modelling, and there are obvious dollar and cent efficiencies in getting the canteen system to talk to the student attendance system. When Australian school systems exchange information, canteen logistics are not a major concern; but getting timetabling right is.

Likewise, the data fields and values captured in SIF reflect American conventions and requirements: they deal in quarters and quinmesters, charter schools, and demographics driven by the American Census and the NCES. The data collected and exchanged in Australian schools needs to reflect Australian requirements, and to conform to Australian norms and conventions. So data is about ESL, not English Proficiency, and the codes for countries and languages are the Australian Bureau of Statistics’. Dealing with these codes in turn means addressing questions such as how many different shades of “Not Applicable” to allow for Yes/No fields, or whether to include both the nodes and the leaves in a hierarchical vocabulary (e.g. whether to allow Netherlandic as a language choice, or only its child nodes, Dutch and Frisian).

Link Affiliates and the ABS, along with members of the SIF-AU data working group, gave feedback on how best to specify the data objects and vocabularies to be used. The SIF-AU draft standard is now being vetted by SIF; meanwhile, pilot projects are underway in several jurisdictions, with an aim to finalise work by the end of the year.