This is an unpublished post that’s so old (Aug ’07) that I don’t know why I didn’t just post the damn thing; I’ve forgotten what I was intending to do with it. I’m posting it now because it contains pointers to useful thinking by David Wiley and others that is germane to the ongoing discussion of data licensing (see post below). I was reminded of this old draft of mine by Deepak’s comment that copyleft may be harmful in the case of scientific data, a point David also makes in respect of his particular Open area, education. Much of what David says maps readily from his field to research, so without further ado:
David Wiley of Iterating Toward Openness has been blogging up a storm about open content licensing:
- Noncommercial Isn’t
the Problem, ShareAlike Is - ShareAlike, the
Public Domain, and Privileging - Copyleft and Fish
in Water - Open Education
License Draft - Assymetry,
Hypocrisy, and Public Domain - Why Not CC
By?
That’s a lot to read, but it’s all good stuff. David makes one very strong argument that I want to emphasize here, because it points up the difficult distinction between data and (creative) work.
In the post introducing his draft Open Education Licence, he provides a very useful outline of the aims of open content:
- Reuse – Use the work verbatim, just exactly as you found it
- Rework – Alter or transform the work so that it better meets your needs
- Remix – Combine the (verbatim or altered) work with other works to better meet your needs
- Redistribute – Share the verbatim work, the reworked work, or the remixed work with others
I really, really like that. David’s “four R’s” resemble the four fundamental freedoms of the Free Software Foundation but do a better job of discriminating between Rework and Remix. The Four R’s make immediate sense to me and I will certainly be Reusing and Redistributing that idea.
David goes on to quote some believable numbers and points out that:
Since half of all CC licensed materials are licensed using a copyleft clause and all GFDL licensed materials are licensed using a copyleft clause, this means that over half of the world’s open content is copylefted. And while the CC and GFDL copyleft clauses guarantee that all derivative works will be “open,” they also guarantee that they can never be used in remixes with the majority of other copylefted works. You can’t remix a GFDL work with a By-NC-SA work when the licenses require that the child be licensed exactly as the parent. Each parent had one and only one license – which license would the derivative use? It’s just not possible to legally remix these materials; copyleft prevents this remixing. [see David’s earlier explanation for details of the incompatibilities among various copyleft licenses]
While promoting rework at the expense of remix – in other words, taking the copyleft approach – is fine for software, it is problematic for content and extremely problematic for education. As educators, we are always remixing materials for use in our classrooms both in the “real” world and online. Your mileage may vary, but over my last 15 years of teaching I would estimate that my remixing activities outnumber my reworking activities 10:1 or more. If other teachers are like me in this regard, then, copyleft is a huge problem for open education.
It’s potentially a huge problem for scientists, too, because much of the potential of Open Science and Open Data (see here for an attempt at defining those terms) is in Remix. There are answers in existing datasets to questions their creators never thought to ask; as Alma Swan put it,
…exciting new developments in text-mining and data-mining are beginning to show what can be done to create new, meaningful scientific information from existing, dispersed information using computer technologies. Research articles and accompanying data files can be searched, indexed and mined using semantic technologies to put together pieces of hitherto unrelated information that will further science and scholarship in ways that we have yet to begin imagining.
This is why I join Peter Murray-Rust in being against copyleft for data:
I am not in favour of copyleft for data. I have no fundamental objection to creating a copyrighted work from data as long as there is significant added value. And copyleft is viral – deliberately. If any item in a system/collection/program etc. is copyleft, then the whole is (at least by the algorithm). […]
I would argue that if I get factual information from WP [wikipedia] then it cannot carry a copyleft. I need the fundamental physical constants and get them from WP. I don’t think that my data and programs are thereby copyleft. All algorithms are now slightly fuzzy.
So what do we mean by “data”? What I mean is “facts about the world of sense-perception”, as distinct from the presentation and interpretation of those facts. So I might not be free to reproduce, say, a scan of a Western blot from a published paper — but having looked at that image, I had better be completely free to do whatever I like with the information it gives me about the way the world works, or else science will grind to a halt. Similarly, if a review article (which contains no new facts, and is all reuse and remix) brings together the results of a number of studies to create new information, or a new hypothesis, about the way the world works, I am not free to copy the wording but I must be free to go into my lab and test the hypothesis.
See also (this was a note to myself in the draft, so caveat lector!):
CC-NC considered harmful (Kuroshin)
When is OA not OA? (Catriona MacCallum in PLoS Biology)
CC, OA and moral rights (Thinh Nguyen, Science Commons blog)
Open Data and Moral Rights (Peter Murray-Rust)