OA and licensing: why not Public Domain?

This is an unpublished post that’s so old (Aug ’07) that I don’t know why I didn’t just post the damn thing; I’ve forgotten what I was intending to do with it. I’m posting it now because it contains pointers to useful thinking by David Wiley and others that is germane to the ongoing discussion of data licensing (see post below). I was reminded of this old draft of mine by Deepak’s comment that copyleft may be harmful in the case of scientific data, a point David also makes in respect of his particular Open area, education. Much of what David says maps readily from his field to research, so without further ado:
David Wiley of Iterating Toward Openness has been blogging up a storm about open content licensing:

That’s a lot to read, but it’s all good stuff. David makes one very strong argument that I want to emphasize here, because it points up the difficult distinction between data and (creative) work.
In the post introducing his draft Open Education Licence, he provides a very useful outline of the aims of open content:

  • Reuse – Use the work verbatim, just exactly as you found it
  • Rework – Alter or transform the work so that it better meets your needs
  • Remix – Combine the (verbatim or altered) work with other works to better meet your needs
  • Redistribute – Share the verbatim work, the reworked work, or the remixed work with others

I really, really like that. David’s “four R’s” resemble the four fundamental freedoms of the Free Software Foundation but do a better job of discriminating between Rework and Remix. The Four R’s make immediate sense to me and I will certainly be Reusing and Redistributing that idea.
David goes on to quote some believable numbers and points out that:

Since half of all CC licensed materials are licensed using a copyleft clause and all GFDL licensed materials are licensed using a copyleft clause, this means that over half of the world’s open content is copylefted. And while the CC and GFDL copyleft clauses guarantee that all derivative works will be “open,” they also guarantee that they can never be used in remixes with the majority of other copylefted works. You can’t remix a GFDL work with a By-NC-SA work when the licenses require that the child be licensed exactly as the parent. Each parent had one and only one license – which license would the derivative use? It’s just not possible to legally remix these materials; copyleft prevents this remixing. [see David’s earlier explanation for details of the incompatibilities among various copyleft licenses]
While promoting rework at the expense of remix – in other words, taking the copyleft approach – is fine for software, it is problematic for content and extremely problematic for education. As educators, we are always remixing materials for use in our classrooms both in the “real” world and online. Your mileage may vary, but over my last 15 years of teaching I would estimate that my remixing activities outnumber my reworking activities 10:1 or more. If other teachers are like me in this regard, then, copyleft is a huge problem for open education.

It’s potentially a huge problem for scientists, too, because much of the potential of Open Science and Open Data (see here for an attempt at defining those terms) is in Remix. There are answers in existing datasets to questions their creators never thought to ask; as Alma Swan put it,

…exciting new developments in text-mining and data-mining are beginning to show what can be done to create new, meaningful scientific information from existing, dispersed information using computer technologies. Research articles and accompanying data files can be searched, indexed and mined using semantic technologies to put together pieces of hitherto unrelated information that will further science and scholarship in ways that we have yet to begin imagining.

This is why I join Peter Murray-Rust in being against copyleft for data:

I am not in favour of copyleft for data. I have no fundamental objection to creating a copyrighted work from data as long as there is significant added value. And copyleft is viral – deliberately. If any item in a system/collection/program etc. is copyleft, then the whole is (at least by the algorithm). […]
I would argue that if I get factual information from WP [wikipedia] then it cannot carry a copyleft. I need the fundamental physical constants and get them from WP. I don’t think that my data and programs are thereby copyleft. All algorithms are now slightly fuzzy.

So what do we mean by “data”? What I mean is “facts about the world of sense-perception”, as distinct from the presentation and interpretation of those facts. So I might not be free to reproduce, say, a scan of a Western blot from a published paper — but having looked at that image, I had better be completely free to do whatever I like with the information it gives me about the way the world works, or else science will grind to a halt. Similarly, if a review article (which contains no new facts, and is all reuse and remix) brings together the results of a number of studies to create new information, or a new hypothesis, about the way the world works, I am not free to copy the wording but I must be free to go into my lab and test the hypothesis.
See also (this was a note to myself in the draft, so caveat lector!):
CC-NC considered harmful (Kuroshin)
When is OA not OA? (Catriona MacCallum in PLoS Biology)
CC, OA and moral rights (Thinh Nguyen, Science Commons blog)
Open Data and Moral Rights (Peter Murray-Rust)

Data are difficult.

Scientific data are not only hard to come by, they’re almost as hard to share, mainly because the scientific infrastructure is armpit-deep and sinking fast in the quicksand of patents, copyrights and ever-multiplying licenses. See Peter Murray-Rust, Antony Williams and Egon Willighagen for the latest dust-up over data licensing; I just want to point out this clear-eyed commentary by John Wilbanks:

The public domain is not an “unlicensed commons”. The public domain does not equal the BSD. It is not a licensing option.
It is the natural legal state of data.
It is a damn shame that we no longer think of the public domain as an option that is attractive. It’s a sign of the victory of the content holders that the free licensing movements work against that something without a license — something that is truly free, not just just free “as in” — is somehow thought to be worse. We’ve bought into their games if we allow the public domain to be defined as the BSD. The idea of the public domain has been subjected to continuous erosion thanks to both the big content companies and our own movements, to the point where we think freedom only comes in a contract.
The public domain is not contractually constructed. It just is. It cannot be made more free, only less free. And if we start a culture of licensing and enclosing the public domain (stuff that is actually already free, like the human genome) in the name of “freedom” we’re playing a dangerous game.
There’s a lot more to get at here.

Yes, there is, and you should read the rest of that entry (and keep up with John’s blog) if you’re at all interested. I’ll add just one brief comment: back when John’s current job was first advertised, I considered applying for it — not that I thought I was qualified, but perhaps the SC would want to hire the new director an offsider of some sort. Having had a couple of years to start learning a bit about Open Access and Open Science, I would venture to say that we are all better off with me in the cheerleading section instead of on the field.