People Australia and Register My Data encoding of parties
We have seen in a previous post that different representations of identity are possible, because there are different business motivations for knowing a party’s identity. Depending on the use we put the identity to, different kinds of detail need to be gathered about a party.
There are two major initiatives for identifying parties being considered at the moment in Australian e-research. Register My Data aims to improve the discovery of research data through the Australian Research Data Commons, and People Australia aims to improve the discovery of resources by and about people and organisations generally. The initiatives do not address exactly the same business concerns, so the metadata they gather are different.
People Australia
The first initiative is the National Library of Australia’s People Australia program. People Australia at its core is a public identity resolution service. People Australia gathers records about people and organisations from a range of collaborating services. Records matching already existing identities are collocated with that identity. Records that do not match are created as new identities. Each identity is assigned a persistent identifier (e.g. “Darwin’s bulldog” Thomas Huxley: http://nla.gov.au/nla.party-869246). People Australia currently makes available around 870,000 names, and is based upon and extends the Australian Name Authority File maintained by Libraries Australia.
As part of its work the People Australia program gathers biographical and contextual information about people and organisations. People Australia profiles the Encoded Archival Context (EAC) “beta” (2004 version) schema. The EAC schema is expansive, and highly flexible: all vocabularies can be profiled, there is strong multilingual support, and there is explicit provision for detailed data on biographical and cultural context. But the People Australia profile of the schema is not drastically smaller than full EAC, because People Australia needs an expansive schema for what it is doing.
The National Library has put much resource into the work to create the next version of EAC: EAC-CPF (Encoded Archival context for Corporate bodies, Persons and Families). Once the EAC-CPF standard is released, the People Australia program will migrate to it and produce the relevant documentation and mappings. The new EAC-CPF standard is more generic and gets rid of many of the idiosyncrasies and dependencies on archival practices in EAC. EAC-CPF will be simpler to use and will be a better fit for what the People Australia program is aiming to achieve.
Register My Data
The second initiative is the Australian National Data Service’s Register My Data service, which leads to the establishment of an Australian Research Data Commons (ARDC). The purpose of this service is strictly data discovery. The service is an aggregator of research data stored elsewhere, so it is not expected to provide any authority for the data it registers: the detail on the data, and parties involved with the data, reside in the source repositories. Data discovery requires much less information than may be provided by EAC, which treats parties as subject matter. Users still need to disambiguate between different researchers with the same name, but because they will be viewing the metadata about research outputs directly, they can do that disambiguation on their own, as they would do with any internet search.
The ARDC is intended to be a comprehensive registry of research done in Australia. It needs to gather information on as many researchers in Australia as it can, since information on researchers drives discovery (and authority in the second instance). People Australia sets out to gather information on as many people and organisations as possible in Australia and internationally, and the overlap with Australian researchers should be considerable; so ANDS naturally sees People Australia as a primary resource—although the metadata People Australia gather is not a direct fit with the ARDC.
Register My Data models identity on the ISO 2146 information model of Registry Services, which we have already discussed here for e-learning rather than e-research registries. It profiles ISO 2146 as RIF-CS, and requires contributors to make data on researchers harvestable in that schema.
ISO 2146 models parties as entities involved in the business of a registry, along with services, collections, and activities: it models all four as specialisations of a common Registry Object. Because it limits its modelling to the business context of registries, it gathers a more constrained set of metadata; the metadata it gathers on parties purposefully has much in common with what it gathers for services, collections and activities. RIF-CS is more tightly constrained still: it serialises ISO 2146 flatly, and it eliminates a number of optional elements. (ISO 1246, as an information model, does not prescribe a particular serialisation.)
The Research Data Australia portal is still in pilot, but is already open for public view and use.


