Panton Principles for Open Data in Science

The Open Knowledge Foundation has just announced the Panton Principles for Open Data in Science. Here’s the point-form version of the Principles (but do go and read the whole thing, including the concise but important preamble; and please consider endorsing):

Formally, we recommend adopting and acting on the following principles:

  1. When publishing data make an explicit and robust statement of your wishes
  2. Use a recognized waiver or license that is appropriate for data.
  3. If you want your data to be effectively used and added to by others it should be open as defined by the Open Knowledge/Data Definition – in particular non-commercial and other restrictive clauses should not be used.
  4. Explicit dedication of data underlying published science into the public domain via PDDL or CCZero is strongly recommended and ensures compliance with both the Science Commons Protocol for Implementing Open Access Data and the Open Knowledge/Data Definition.

I’ve written elsewhere about my feeling that Open Data/Open Science will eventually need a set of core Declarations to do for the wider movement what the BBB definitions have done for Open Access. A set of widely accepted terms and definitions provides a framework within which ongoing discussions can be much more efficient, focused and useful, as well as a point of reference and a standard introduction for newcomers to a field. Kudos to OKF and partners for making a strong start in this direction.
I do have one small quibble. Following Peters Suber and Murray-Rust, I want Open licenses to be three things:

  • explicit
  • conspicuous
  • machine-readable

The Panton Principles come right out and say “explicit”, and “machine-readable” is largely covered because the recommended licenses are available in machine-readable versions (though I’d have preferred to see that actual phrase in the text of the Principles). What’s missing, to my mind, is “conspicuous”. The point of Open licensing is to enable and promote re-use, so it’s important to make your license as obvious as possible to potential users. This might seem trivial, but I think it bears spelling out.
My own Open Data mantra is:

  • where are the data?
  • can I have them?
  • what can I do with them?

and again, the PPs are 2 for 3 by my count. The licensing covers what I can have and what I can do with it, but there’s no mention of where I can find it in the first place. When we’re talking about a database, the question doesn’t arise since the license is in the same place as the data. But if we’re talking about data which underlie a published paper, those data are very often not in the same place as the paper, even if the license is there. So it’s important to make sure that your data are available: find or build them a stable online home and then let potential users know where it is. There’s not much point in placing something in the Public Domain if the only copy is on your desktop. I’d have liked to see an explicit discussion of storage, access and signposting in the Principles… though come to think of it, this is really a different (and enormous) set of questions. So perhaps “conspicuous” covers this as well, and the missing Principle is simply that there should be a highly visible link to the license and the data themselves in every place where they are used, mentioned or otherwise likely to be encountered.
Of course, there are always unresolved questions no matter how carefully you craft your Declarations and Statements and Principles — which is why the OKF has wisely built a companion tool, the Is It Open Data? web service. This is a brilliant way to remove ambiguity once and for all, on a case by case basis, by making public enquiry into the openness or otherwise of specific data sets. You can browse previous enquiries, so as to avoid redundant questioning of data owners; and naturally, recipients of multiple enquiries can use the service in a different way, simply linking to the record of their first response by way of answer to subsequent queries. Searchability might be a concern once the database of enquiries starts to grow, but that functionality can be added as needed. A central public service for asking questions about data availability and archiving the answers could go a long way towards improving access to data, simply by making clear the level of demand for Openness, and the degree to which supply falls short.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>