Does the “green road”1 lead off a cliff?

Further to my complaints about the copyright thicket in which data are being lost, Charles W Bailey Jr points out that, in fact, it’s worse than that: a good deal of the potential functionality of existing Open Access archives is jammed up in the same thicket:

If… repositories could not be trusted, then libraries would have to attempt to archive the postprints in question themselves; however, since postprints are not by default under copyright terms that would allow this to happen (e.g., they are not under Creative Commons Licenses), libraries may be barred from doing so.

(Emphasis mine.) Charles is talking about the question of whether or not self-archiving of scholarly articles (the “green road” to Open Access) will cause libraries to cancel journal subscriptions. I touched on this issue in an earlier entry, and don’t want to revisit it here. What interests me here is the fact — which I initially had trouble grokking, as you’ll see if you read the comments on Charles’ entry, where he patiently explains it — that digital objects in Open Access repositories carry their own copyrights, rather than being covered by a blanket license provided by the repository.  For instance, PubMed Central refers to Open Access (using the Bethesda Statement), and then says:

Note that this definition of open access goes beyond the simple free access that applies to all full-text content viewable directly in PubMed Central (PMC) from the National Institutes of Health (NIH).

A number of PMC journals make all or most of their contents available as open access publications. See the Open Access list for details.

So PMC is OAI-PMH-compliant, but contains digital objects that are not themselves Open Access. I suspect the same is also true of the majority of institutional and centralized repositories (though I only checked ePrintsUQ, arXiv.org and Cogprints, none of which make any mention of copyright at all).

To get an idea of what that actually means, read carefully this brief discussion by Peter Suber of the BBB definition of Open Access:

The best-known part of the BBB definition is that OA content must be free of charge for all users with an internet connection. However, the BBB definition doesn’t stop at free online access. It adds an extra dimension that isn’t as easy to describe, and consequently is often dropped or obscured. This extra dimension gives users permission for all legitimate scholarly uses. It removes what I’ve called permission barriers, as opposed to price barriers. The Budapest statement puts the extra dimension this way:

By “open access” to this literature, we mean its free availability on the public internet, permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. The only constraint on reproduction and distribution, and the only role for copyright in this domain, should be to give authors control over the integrity of their work and the right to be properly acknowledged and cited.

The Bethesda and Berlin statements put it this way: For a work to be OA, the copyright holder must consent in advance to let users “copy, use, distribute, transmit and display the work publicly and to make and distribute derivative works, in any digital medium for any responsible purpose, subject to proper attribution of authorship”.

All three tributaries of the mainstream BBB definition agree that OA removes both price and permission barriers. Free online access isn’t enough. “Fair use” (“fair dealing” in the UK) isn’t enough.

Because each digital object carries its own copyright, e-print repositories do not remove permission barriers.  Here’s Peter Suber again:

Permission barriers are more difficult to discuss than price barriers.  First, there are many kinds of them, some arising from statute (copyright law), some from contracts (licenses), and some from hardware and software (DRM).  They are not like prices, which differ only in magnitude.  Second, their details are harder to discover and understand.  Third, different users in different times, places, institutions, and situations can face very different permission barriers for the same work.  Fourth, authors who deposit their articles in open-access archives bypass permission barriers even if they also publish the same articles in conventional journals protected by copyright, licenses, and DRM. 

As far as I can tell, that fourth point is simply not true of any existing archives.  If you want to do anything with an article in, say, PubMed Central, other than simply read it — if you want to copy it and distribute the copies, if you want to make a derivative work, if you want to pass it to text-mining or other software — you will have to determine, on an article-by-article basis, whether you are allowed to do that. 

Take, for example, the following paper from the lab I work in, available free from PubMed Central:

Deletion of Mnt leads to disrupted cell cycle control and tumorigenesis.
Peter J. Hurlin, Zi-Qiang Zhou, Kazuhito Toyo-oka, Sara Ota, William L. Walker, Shinji Hirotsune, and Anthony Wynshaw-Boris

Right above the title on the linked page is a copyright notice: “Copyright © 2003 European Molecular Biology Organization”.  The link provided goes to a PMC page which makes it very clear that an article’s presence in PMC tells you nothing about what rights the copyright holder(s) reserve or waive.  Searching the EMBO site for “copyright” brings up nothing useful, but the EMBO Journal (which is actually part of Nature Publishing Group) has this to say:

Nature Publishing Group does not require authors of original research papers to assign copyright of their published contributions. Authors grant NPG an exclusive licence to publish, in return for which they can re-use their papers in their future printed work. NPG’s author licence page provides details of the policy and a sample form. Authors are encouraged to submit their version of the
accepted, peer-reviewed manuscript to their funding body’s archive, for public release six months2 after publication. In addition, authors are encouraged to archive their version of the manuscript in their institution’s repositories (as well as on their personal web sites), also six months after the original publication.

Apart from the foul six-month embargo (Do you have any idea how many experiments I can do in six months?  But I digress.), this seems reasonable, and it leaves permissions up to the authors.  So “copyright EMBO” is misleading, and it’s likely that EMBO J authors, having reposited their articles, wish them to be fully Open Access.  As it happens, in this case the corresponding author is my boss so I can assure you that he knows about Open Access and is all in favour. The point, though, is that you have to dig around to find out that it’s up to Peter, and then you have to contact him to find out that he fully intends you to have the permissions you need. You are not going to be able to do that for more than a handful of papers; it certainly puts an effective brake on text-mining.
I think this brief example makes clear that, in practice, you cannot do anything much with repository content but read it (“fair use”, of course, still applies).  You simply don’t have the time to uncover the necessary permissions for anything else.  Which in turn means that there are no, or very few, actual Open Access repositories currently in existence.
I’ll say it again: e-print repositories do not provide Open Access.  They provide free access to human eyes, one paper at a time; as the accepted definitions make clear, that’s not at all the same thing.  Since self-archiving in such repositories is the current focus of many, if not most, efforts to provide 100% Open Access to the world’s scholarly literature, this is a big deal.  There are two obvious solutions: 1, ignore the whole issue; and 2, start applying labels to digital objects.
In the short term and for individual researchers, solution 1 has considerable appeal.  There’s even precedent: a recent study pointed out that patents do not slow research down much, mostly because researchers ignore them.  The majority of e-prints are probably in a repository because their authors want Open Access; the likelihood of running afoul of copyright and actually being called to account for it seems pretty low.  I think, however, that this head-in-the-sand approach is a very bad idea.  What authors want is not always what counts, as when the copyright is actually owned by a publisher.  I’ve been trying to think of the kinds of things you might do with a body of OA literature — build a text-mining robot that offers novel ways to look for deep connections between ideas and among data, make a local database of papers on your research specialty, and so on — but in fact, much of the point of Open Access is to make possible things I cannot think of.  Look what the Web has made possible, and ask yourself: how much of that could I have predicted in 1991?  It seems to me that anything which makes use of a substantial number of papers, or relies on being able to mine an entire corpus, runs the risk of being shut down or co-opted just when it starts to get interesting and useful.  Suppose, for instance, that I write that text-mining robot: while I am using it to feed ideas into my own benchwork, I’m OK, but the minute I give that robot to someone (or, as is my preference, everyone) else, I run the risk of being sued for copyright violations.

This is the same risk that researchers are already running when using patented technology without a license; you are fine until you come up with something good, but then if the patent owner notices what you’ve done, you can be in trouble.   “Trouble” means three things: legal sanctions, loss of the opportunity to profit from your invention, and removal of your invention from the commons.  The first seems pretty unlikely from an individual perspective — what company is going to risk the PR nightmare of trying to recover fines from a researcher? — but substantially more worrisome for universities and other institutions.  The “loss of profit” is of no interest to me; if I wanted to be rich I wouldn’t be a scientist.  What really concerns me is the potential for patent/copyright owners to exert anti-commons, profit-taking control over research outcomes, and it’s this risk that makes the Ostrich Option unacceptable to me.

In the longer term, for community minded researchers and especially for institutions (which are typically more wary of litigation than individual researchers, and since Bayh-Dole, increasingly focused on profiting from research outcomes), solution 2 is a reasonable fix.  In principle, OA repositories could include labels (that is, metadata) specifying which uses are explicitly permitted or prohibited, so search engine users and text-mining robots could search only that portion of the database that allowed whatever rights they need.  In fact, the Bethesda and Berlin definitions of OA both include the requirement for every OA article to carry an explicit label regarding permissions.  Project RoMEO was intended to deal with precisely this issue, and produced (in addition to the valuable SHERPA/RoMEO database of publisher permissions for forward-looking authors) six surveys of the field and an XML-based implementation of the resulting rights management concepts, incorporating Creative Commons licenses. Unfortunately, there seems to have been zero uptake of the concepts or the technical implementation.  As far as I can tell there are no search interfaces which provide this kind of rights-based functionality, and every repository contains a mix of well-labelled, partially labelled and unlabelled objects.  In addition, the body of scholarly work in relevant repositories is already so large that adding the necessary rights metadata is an enormous task, one which grows larger and more forbidding by the day (I might call this the “backlog problem”). 
Nonetheless, the fundamental OA definitions include rights beyond simple reading access for good reason.  As I discussed in my earlier entry, rights management is going to be at the heart of Open Data, and I have argued elsewhere that licensing and standards/metadata are also going to be crucial to bringing the “openness” of Open Access to science as a whole.  I think the Open Science field is headed for some serious problems if permissions barriers are not given more attention.   I might concede that the most important thing to achieve right now is removal of access barriers to human eyeballs, but why make trouble for ourselves by — as seems to be happening3 — ignoring the rights issue?  There’s no reason why the process of encouraging authors to self-archive, and building tools to make that easier, should not include information and tools that focus on rights management.  At the very least, we should be making authors who are already on-side, who are self-archiving and using the SPARC Author Addendum and so on, aware of the issue — and giving them the tools to label their own papers with clear statements of the rights they wish to retain or waive. At least then the rate of growth of the backlog problem will begin to slow down, and should approach zero as we approach 100% OA (even on the green road) rather than continuing to grow unchecked.

Respect. (As I hear the kids say.)

Glyn Moody is re-tagging all his old posts, so subscribers to his RSS feed are getting a quick run through his blogging history. If you have any interest in Open Source or Open Science, check him out.
To whet your appetite: today he re-tagged a post pointing to a story that was posted to LWN.net in March, on Project Gutenberg founder Michael Hart (wikipedia, Poynder interview, PG about: page, blog of sorts). Hart is quite a character (as seems common among visionaries), and the linked resources make interesting reading (especially Hart’s own writing). What really grabbed my attention was this detail from Glyn’s article:

Even 20 years after Project Gutenberg had begun, Hart had only created 10 ebooks..

That was my “holy crap” moment for the day. Think about it: it’s 1971, what will become the Internet consists of 15 nodes and about 100 people, Sir Tim won’t invent the Web for another 20 years, and you are given an account on one of those nodes. What will you do with it? Well, if you’re Michael Hart, you will see forward more than a quarter of a century and begin Project Gutenberg, and then for well over twenty years you will be virtually its sole proponent and defender. In 1997, PG had 313 ebooks. In 1998, collaboration with the University of Illinois PC User Group finally set the wheels in motion for the creation of the PG we all know and love today; by the end of that year there were 1600 ebooks in the collection, and today there are 20,000. The clarity of that original vision and the tenacity with which Hart made it a reality are simply breathtaking.

Jan Vermeer to his Model.

Jan Vermeer to his Model
(Girl with Pearl Earring, ca. 1665)

All the light at my command is in this brush:
I bid a skyful crowd into a pip
and place, with painter’s hand and lover’s touch,
reflections in your eyes and on your lip.
I pour the day like water from the side,
and caught between the woman and the girl,
as where a twilight and the sea collide,
I find these careful shadows for your pearl.
I have no words for this, I cannot name
the strange sense of a flower in your face;
but I can paint the way it waits to bloom,
and stop time on this cusp of quiet grace.

We just watched Girl With A Pearl Earring, the movie based on Tracy Chevalier’s debut novel (which now I think I’d like to read). The movie is very pretty — too pretty at times for its less glamorous subject matter, but beautifully evocative of Golden Age Dutch art in the scenes where it matters. It builds a fine, slow suspense, and if the eponymous painting has ever held you in its famous spell you will enjoy the way the film treats it.
I dug this verse out and decided it wasn’t completely worthless — mainly for the third stanza. It’s not known who the model really was; when I wrote this I had in mind the most popular theory, that she was Vermeer’s eldest daughter, and my own idea (which Chevalier apparently shares) that the painting has too much of a sexual undertone for that to make complete sense. Let me know what you think.

Where are the data? Can I have them? What can I do with them?

There’s a new subversive proposal in town.  The original was Stevan Harnad’s landmark call for self-archiving of the scientific (“esoteric”) literature (see here for a ten-year update, and here for context).  Now, 12 years later, Open Access is gathering momentum and forward-looking advocates of knowledge as a public good are thinking about Open Data (some extra background here).  Peter Murray-Rust recently stepped up with a subversive proposal of his own:

The simplest thing that researchers can do [to promote Open Data] is to add a Creative Commons license to their data. It costs nothing, is a simple cut-and-paste, and could be trivially made a template in any data production tool. […]
I think the effect of this would be dramatic. Scientists would start to see these messages and think: “Why should I give these data to the publisher?” And if the publisher simply adds a copyright notice saying “all these data are copyright the publisher – you cannot use them for X, Y, Z without permission” this would be in violation of the authors’ license. The author would have to deliberately remove this statement to hand over the IPR to the publisher.

I think Peter’s proposal is a good one, similar in form and effect to the SPARC author addendum.  Importantly, Science Commons also offers author addenda, and will soon offer them in the machine-, human- and lawyer-readable versions that come with all Creative Commons licenses; as Peter notes, the machine-readable version is crucial to full Open Data utility.  Use of the proposed Open Data addendum (in combination, where necessary, with an Open Access addendum) would clarify the legal status of an author’s data, provided we get the wording right.  Herewith some thoughts on how to do that, based on the questions in the title.

First, note that papers do not usually contain raw (useful, useable) data. They contain, say, graphs made from such data, or bitmapped images of it — as Peter says, the paper offers hamburger when what we want is the original cow.  Chris Surridge of PLoS puts it this way:

A figure in a paper is a way of representing the raw data in such a way to best illustrate the point the author is making. A figure then is the product of an operation upon the raw data, and that operation results in a loss of information.
The raw data could have been presented in a host of different ways possibly supporting other conclusions not thought of by the author. Equally if a reader had raw data compatible with that the author obtained wouldn’t it be useful if it could be processed in the same way for comparison? Wouldn’t it be much better for readers to have access not only to the figures in a paper but also to the underlying data and the transform that created it. In this way no information, neither implicit nor explicit, is lost.

So if authors want to make their data openly and usefully available, they will need to host it themselves or find someone to host it for them.  Many journals will host supplementary information, and many institutional repositories will take datasets as well as manuscripts.  I have been saying for some time that it should by now be de rigueur to make one’s raw data available with each publication. This is very rarely done — even supplementary information, when I have come across it, tends to be of the hamburger-rather-than-cow variety and so not very useful.  (The situation speaks sad volumes about the emphasis on competition over cooperation within the scientific community and, perhaps in many cases, about the quality of the raw data in question, if only one were ever able to see it; but I digress.)  Thus an effective Open Data addendum will first have to answer the question: where are the data?

Second, there is the issue of licensing (“Can I have them?  What can I do with them?”).  In comments on Peter’s proposal, Jonathan Eisen observes that publishing in Open Access journals should provide open access to data as well.  Peter replies that this is not always the case and points to Molbank as a problematic example, because they require a copyright transfer and it is simply not clear what rights they claim over raw data.  In fact, the situation is even worse.  In the same entry, Peter points approvingly to the BioMed Central OA charter, which is based on the Bethesda Statement:

Every peer-reviewed research article appearing in any journal published by BioMed Central is ‘open access’, meaning that:

  1. The article is universally and freely accessible via the Internet, in an easily readable format and deposited immediately upon publication, without embargo, in an agreed format – current preference is XML with a declared DTD – in at least one widely and internationally recognized open access repository (such as PubMed Central).
  2. The author(s) or copyright owner(s) irrevocably grant(s) to any third party, in advance and in perpetuity, the right to use, reproduce or disseminate the research article in its entirety or in part, in any format or medium, provided that no substantive errors are introduced in the process, proper attribution of authorship and correct citation details are given, and that the bibliographic details are not changed. If the article is reproduced or disseminated in part, this must be clearly and unequivocally indicated.

But what does that mean for Open Data?  Take any paper in any BMC journal: where are the data?  Can I have them?  What can I do with them?  It’s true but it’s simply not enough that, having published in BMC, the authors are probably amenable to giving me the data and allowing me to do with them as I please.  I need unfettered access to the data at the same time as I access the paper.  Even as a human I don’t have time to chase down permission for every dataset I want to re-use, and if I’m data-mining by web crawler I need machine-readable licenses that tell my robot what it can have.  Policies regarding data and materials are journal-specific within the BMC group, but I browsed a few and it seems they all use a standard template, which includes the following:

Submission of a manuscript to [BMC Journal in question] implies that readily reproducible materials described in the manuscript, including all relevant raw data, will be freely available to any scientist wishing to use them for non-commercial purposes. Nucleic acid sequences, protein sequences, and atomic coordinates should be deposited in an appropriate database in time for the accession number to be included in the published article. In computational studies where the sequence information is unacceptable for inclusion in databases because of lack of experimental validation, the sequences must be published as an additional file with the article. [There follows a list of databases that can be used to deposit nucleotide and protein sequences and structures, chemical structures and assays, microarray data, computer models and plasmids.]

Note though that these policies are not strict demands, and I’ll bet they are not policed in any way.  I think most journals include similar language in their instructions to authors, and have done for some time, but we still do not have widespread Open Data.  Further, the actual BMC license (which BMC says is identical to the Creative Commons Attribution License) refers only to “the work” which it defines as “the copyrightable work of authorship offered under the terms of this License”.  That seems to me to allow an interpretation that excludes data, which sit in the grey zone between creative works that can be copyrighted and, er, things (like gene sequences and chemical structures of drugs) that can be patented.

So how about Public Library of Science and Hindawi, the other major OA publishers?  Well, Hindawi seems to say nothing about data whatsoever, only that authors retain copyright and articles are published under a CC Attribution license.  PLoS also publishes everything under a CC Attribution license, which says nothing about data, but if you dig a bit you find encouraging things in the editorial/publishing policies:

Publication is conditional upon the agreement of authors to make freely available any materials and information associated with their publication that are reasonably requested by others for the purpose of academic, noncommercial research.
Data Availability
Open access applies to both the scientific literature and the data used to establish that literature. Publication is contingent on making data integral to a manuscript freely available without restriction, provided that appropriate attribution is given and that suitable mechanisms exist for sharing the data used in a manuscript.

  1. Data for which public repositories have been established that are in general use should be deposited before publication, and the appropriate accession numbers or digital object identifiers published with the paper.
  2. If an appropriate repository does not exist, data should be provided as supporting information with the published paper. If this is not practical, data should be made freely available upon reasonable request.
  3. The conclusions of a study must not be dependent solely on the analysis of proprietary data. If proprietary data were used to reach a conclusion, and the authors are unwilling or unable to make these data public, then the paper must include an analysis of public data that validates the conclusions so that others can reproduce the analysis and build on the findings.

Note that any restrictions on the availability or on the use of datasets might be judged to diminish the significance of a paper and will therefore influence the decision about whether a paper should be published. These policies have been developed in accordance with the principles established in Sharing Publication-Related Data and Materials (National Academies Press, 2003).

That’s better, stronger language — but why is there no mention of data in the actual license, and why is there a need for warnings about restrictions that “might be judged to diminish the significance, etc” if publication is truly conditional on open access to data?  I suspect another toothless tiger.  It’s not that I want the tiger to have teeth, that is, for journals to actively police data availability, but that I wonder why I have to go digging around the website just to find this wishy-washy nod in the general direction of Open Data.  To illustrate my point here, suppose I read a paper in PLoS Biology, and I want to get my hands on some raw data from that paper: where are they?  Can I have them?  What can I do with them?  All of these things are, basically, left up to the authors. 

Now remember that these highly unsatisfactory examples are drawn from the most prominent Open Access publishing houses, which might be expected to be much more supportive of Open Data than commercial traditional publishers.  Thus the power of Peter’s Open Data addendum becomes apparent: it is attached directly to the paper, so readers do not have to go hunting through journal websites to find out the intellectual property status and location of interesting datasets.  It allows authors to take control.

To be effective, then, an Open Data addendum must at least answer my opening questions: it must point to the online, freely accessible location of the raw, un-hamburgered data; it should make clear that yes, you can have them; and it should state clearly what you can do with them.  The last question probably requires the creation of multiple addenda, since some people (like Jonathan Eisen) will want to effectively copyleft their data, whereas others will prefer less restrictive licenses.  My preferred answer is “anything you want, so long as you do not remove information or materials from the scientific commons”.

So, finally, let me take a stab at a draft Open Data addendum.  This is based on largely copied from the SPARC author addendum, and my idea is that it should, like (and if necessary with) the SPARC addendum, be submitted to the publisher together with their publication agreement.

AUTHOR’S ADDENDUM TO PUBLICATION AGREEMENT

THIS ADDENDUM hereby modifies and supplements the attached Publication Agreement concerning the following Article:

[manuscript title]

and the following Raw Data from which the Article was prepared:

[list of data sets, including permanent web address/es from which they can be obtained]

The parties to the Publication Agreement and to this Addendum are:

[list of authors, indicating corresponding author] (individually, or if more than one author, collectively, the Author), and

[publisher].

The parties agree that wherever there is any conflict between this Addendum and the Publication Agreement, the provisions of this Addendum are paramount and the Publication Agreement shall be construed accordingly.  Notwithstanding any terms in the Publication Agreement to the contrary, AUTHOR and PUBLISHER agree as follows:

1. Author’s Retention of Rights. In addition to any rights under copyright retained by Author in the Publication Agreement, Author retains all rights to the Raw Data underlying the Article, including but not limited to: (i) the rights to reproduce, distribute and publicly display the Raw Data in any medium; and (iii) the right to authorize others to make any use of the Raw Data so long as Author receives credit as author and the journal in which the Article has been published is cited as the source of first publication of the Article and Raw Data.

2. Licensing of Raw Data.  Author hereby releases the Raw Data under the terms of a Creative Commons Attribution Share-Alike License [or insert whatever license you prefer], where “the work” is understood to mean the data sets listed above.  Publisher agrees to include in the Article this statement of licensing terms and the above list of data sets and web address/es from which they can be freely obtained.

3. Publisher’s Acceptance of this Addendum. Author requests that Publisher demonstrate acceptance of this Addendum by signing a copy and returning it to the Author. However, in the event that Publisher publishes the Article in the journal identified herein or in any other form without signing a copy of the Addendum, Publisher will be deemed to have assented to the terms of this Addendum.

That’s not perfect, not by a long shot — most especially not for automated data mining, which requires machine-readable metadata and data. It should, however, do what Peter suggests: provide some relief from endless rounds of find-the-permissions, and get a much-needed conversation underway.

this is just to say

To those of you who read my Simpy RSS feed to see what I’m collecting in my bookmarks, I just want to say that I had nothing to do with the appearance of advertising in that feed and it won’t be staying. I’ve written Otis to find out what’s going on, and to say that I’ll happily pay good money for a premium no-ads service. It would suck to have to switch bookmarking apps again, because Simpy is a great tool, but I will NOT put up with advertising in my RSS feeds. (I just saw one for US Army recruiting, for fuck’s sake! I am NOT going to start shilling for the hired killers. That’s what prompted this entry.)
In fact, since my blogroll is entirely driven by Bloglines, I’m thinking of de-linking anyone who allows advertising in their feeds. Feed ads are intrusive and annoying, and to date RSS feeds have been about the one place on teh internets that has been mercifully free of that particular plague. I’m not at all responsive to the “ads support this service” argument as applied to RSS. I’ve come to terms with on-site advertising as more and more of the bloggers I like have decided to try to make a little extra from their sites, but feed ads are where I draw the line. “You can have my feed with ads, or no more of my writing” gets a simple answer from me: goodbye. I also suspect — though of course I have no data to hand — that this cancellation effect will more than compensate for any increased revenue that feed ads might generate.
Update: On re-reading that, I should add: I don’t want this to sound like a threat, or even (at this stage) a complaint. Otis has been doing a lot of work behind the scenes lately, improving Simpy’s stability and speed, and I presume that the ads appearing unwarranted and unannounced is simply a glitch that will speedily be fixed.

The bottom line, and an idea.

Relatively new addition to the blogroll Glyn Moody points out the bottom line in all “intellectual property” issues: it’s not property, and anyone who tells you otherwise is lying for profit:

A very interesting transcript of a conversation between Reuters and Warner Music Chief Executive Edgar Bronfman. The latter […] is revealed for what he is when he slips in the Big IP Lie:

Intellectual property is intellectual property, whether it’s in the form of an avatar or a song or any such thing. These are the creations of someone’s mind, and it’s property as real as real estate.

No, Ed, no, no, no. What you call “intellectual property” is really an intellectual monopoly: it is a limited privilege, granted by the state, to encourage creativity. It is not property, however much you might like to claim it implicity. It is a bargain, with a quid pro quo: it has to allow reasonable “fair use”, and it has to be given up after a reasonable time. You and your industry seem to have forgotten both aspects.

(Quite a lot elided there, so do read the whole post.)
From there to an idea: Glyn pointed to Moving To Freedom, where I found Scott pointing back to Glyn’s The Great Software Schism and sideways to his own thoughts on Free vs Open Source. I had a section on this in my “open access” essay for 3QD, but I cut it out in the interest of brevity, because the open source section was just there for background and I assumed most people reading 3QD would be at least somewhat familiar with it. It went like this:

Richard Stallman started the GNU Project in 1983/4 as a reaction against the rising influence of proprietary software, and a year or so later founded the Free Software Foundation, which “is dedicated to promoting computer users’ rights to use, study, copy, modify, and redistribute computer programs.”  What Stallman and the FSF mean by “free software” is famously summed up by the dictum, “free as in speech, not free as in beer”; more precisely, they mean “free” as in:

  • The freedom to run the program, for any purpose
  • The freedom to study how the program works, and adapt it to your needs
  • The freedom to redistribute copies
  • The freedom to improve the program and release your improvements to the public

Access to the source code is a precondition for these freedoms, and many advocates prefer that the “four fundamental freedoms” also be combined with some form of copyleft (basically a licence which explicitly disallows use of the original resource in any way that restricts the four freedoms for anyone else).

About a decade later the Open Source Initiative appeared, offering itself as a “more pragmatic” approach to free software.  The two definitions are pretty similar, though the OSI version allows some licencing that the FSF considers too restrictive of end users.  A common view of these two groups is that Open Source is a development methodology, whereas Free Software is a social movement.  (You can, if you care to really get into it, read Stallman on why free is better than open source and the OSI on why the term “free” is too ambiguous.  Oy.  Wikipedia is good on all of this if you want more details: open source, open source software, free software.)

So anyway, if you’re not familiar with the “schism”, there’s some background. I’ve argued that the same sort of openness as brought to mind by Free/Open Source Software is vital for the future of science, and since a movement needs a name I’ve tentatively proposed Open Science as the banner under which open access, open data, open standards, open licensing and open source might assemble to their greatest mutual benefit. As it happens though, one of the earliest movements towards what I am calling “open” science was called the Free Science Campaign, run by Stefano Ghirlanda. (The page is offline now. I ran across it while doing my graduate studies, and it is an enduring regret that I never signed up.)
Here’s the idea, then, for all that it opens up an awful can of worms: should we be calling the campaign to free up scientific information (text, data and software) “Free Science”, for the same reasons Stallman insists on “Free Software”?
It would be rather too much to just toss that out there, so here’s my view. While I have great sympathy with Stallman’s arguments in favour of Free, and am personally committed to do as much of my science completely in the open as I can, I know my tribe. Scientists are a cynical, self-interested lot. For instance, I was scoffed at for recommending BioRoot to colleagues — the whole idea of sharing tends to be seen as naive, asking to be taken advantage of. It’s been my experience that the first response of most scientists to any “open” scheme (like BioRoot, or Open Notebook Science) is not “how cool!” but “what about bad actors? how will you keep from being robbed?”. (Which says something about what the culture of science does to a person, but I digress.) To my mind, this largely explains why BioRoot hasn’t taken off as I would have hoped/expected, and is something of which to be wary. I am concerned that “Free Science”, particularly if explicitly connected with “idealistic” Stallman (as contrasted with the “pragmatic” OSI), might meet with a chorus of sneers from the people who need it most. So for now, I think we should stick with “Open Science”.

HUHO blog carnival #1 is up

Remember this? The first blog carnival is up.

…this project is selfish. I need help. But later, I thought, while this plea that would otherwise be considered blegging began to take shape, maybe other people could use the advice. And hey, maybe people who would otherwise consider themselves apart from this sort of daily worry could help too. Some of us need some help finding those bootstraps, hell, finding boots.
So here we are. These are ways to pinch a life that is already pinched, to beat the system, to get by when getting by is what you’re doing already.

It’s a damn good start: children’s entertainment, clothing, education, money management, food and more.