Identifier business model

(Written for GBIF LSID/GUID Task Force)

Any global identifier system, in order for it to be taken up the community and serve its needs, has to provide a few basic services:
 * 1) Clarity - the nature of what is being identified has to be documented.
 * 2) Resolvability - it must be possible to use some Internet protocol to look up identifiers to find out what they mean.
 * 3) Reliability - short-term downtime should be minimized.
 * 4) Longevity - resolution service needs to survive the reorganization and demise of data publishers.

I will assume that there is a set of organizations who have taken on the responsibility of creating and making available to the larger community some significant scientific resources. Each resource is a database or repository consisting of a large number of records of some kind, and each record has its own identifier and/or describes one or more entities with their own identifiers. So the question of a business model for the production of data is out of scope.

Reliable, stable, persistent provisioning of resolution
Reliability of resolution service can be secured through technical replication. Stability and persistence of resolution across organizational changes can be secured through administrative replication - the redundant provision of the same service by distinct organizations.

Identifier systems are often organized in two tiers, the first providing only forwarding and simple metadata, and the second providing the content itself. The economics of the two tiers are somewhat different.

Data providers, while often highly competent and well financed, are not always competent to establish technical or administrative replication (of either tier), nor do they necessarily have the incentive to provide such redundancy.

For this reason independent agents are invoked, either by the community of users of a set of identifiers or by content publishers, in order to provide identifier system quality. Examples of such value-added services include
 * 1) Handle system, $50/year (?), thousands of publisher/members, 3 staff members
 * 2) Digital object identifier (uses Handle system)
 * 3) Crossref (a nonprofit provider of DOIs), 6 or so staff members

These provide progressively stronger levels of service and service level agreements.

When publishers are available and buy in to the identifier system, the work of curation (at tier 1) can be laid on their shoulders. For example, if an organization changes its domain name from acmebio.org to primobio.org, then it can update the metadata associated with their handle system publisher id (a number, say 13579) to point to the new domain, and they can change all of the URL metadata fields to point to the new URLs.

If a publisher goes extinct, or loses interest in the identifier system, or doesn't participate in it in the first place, then the user community, if it wants to keep using the identifiers, must pick up the cost of providing forwarding and, if necessary, replicating the content as well (license terms permitting).

The Shared Names system starts with the latter premise: that the data providers don't care about persistent identifiers (within the global Shared Names URI space at least). Any responsibility for forwarding and curation of forwarding pointers as the data providers changes its URIs is in the hands of those who care about the identifier system.

Costs

 * 1) Setup costs: organizing, scripting, education
 * 2) Domain name registration(s), if any
 * 3) Keeping a forwarding service running - redundantly
 * 4) Curating the forwarding service - fixing bad links, speedily, as they're discovered
 * 5) Replicating content (optional)

Skunkworks
A simple replicated forwarding service is not too hard to run, and one approach might be for a bunch of people to get together and just do it in their spare time, on servers they already control, without officially asking for any institutional support.

Absorbed into some organization's operations
Example: OCLC says it will provision purl.org in perpetuity. The organization in question had better be pretty darned stable - like a university library. OCLC fits this description.

Consortium
A group of concerned parties could pool their resources in order to support say, a half time or full time position. These parties might be publishers of identifiers, consumers of identifiers, foundations, or some combination.

Grants
Setup fees for initial programming and scripting could probably be funded through a grant, especially if the identifier system had widespread advance community support. This does not necessarily explain where long-term support will come from, though.

Data provider membership fees
The Handle system, DOI system, and Crossref survive on fees paid by publishers. This is effective because there is a large economy of scale. For the Handle system, the fees are small relative to the cost of publishing, and can be absorbed in the same way that a domain name fee can be. The fees are significantly larger for Crossref DOIs, but the level of service is higher (??).

Obviously data provider fees are of no help to the community if the provider stops paying them. Publishers who withdraw from the Handle system leave behind identifiers that make a lie of the "persistent identifier" idea and leave users of the identifiers high and dry.