Heather Morrison points to this excellent post by Glen Newton, wherein Glen proposes that Open Access should explicitly include machine readability:
Open Access must include access by machines:
* At minimum one must allow crawls of the site/content or (to reduce the impact of badly configured crawlers) create a compressed XML file containing all metadata and either content, or direct links to content and make it available for download (and if bandwidth is still an issue put it on a P2P network like BitTorrent).
* Preferable is to offer some kind of API (OTMI) or protocol (OAI-PMH) to get at content and metadata and citations.
* Better is to offer access to the XML of the articles in addition to the PDF and/or HTML; if the XML actually has some semantic content, then we are approaching the optimum.
The end goal is to support and encourage text mining and analysis of the full-text (preferably semantically rich XML), metadata and citations to allow literature-based exploration and discovery in support of the scientific research process.
Most importantly: hear, hear!
I do, however, have a nitpick to make. Heather makes no comment on Glenn’s idea that this is an addition to the definition of OA, but in fact I think it’s already built in to the accepted BBB definition. Peter Suber refers to the removal of price and permission barriers, to distinguish Open from “merely” free access, which removes only price barriers; I’ve quoted him on this before, so here he is again:
The best-known part of the BBB definition is that OA content must be free of charge for all users with an internet connection. However, the BBB definition doesn’t stop at free online access. It adds an extra dimension that isn’t as easy to describe, and consequently is often dropped or obscured. This extra dimension gives users permission for all legitimate scholarly uses. It removes what I’ve called permission barriers, as opposed to price barriers. The Budapest statement puts the extra dimension this way:
By “open access” to this literature, we mean its free availability on the public internet, permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. The only constraint on reproduction and distribution, and the only role for copyright in this domain, should be to give authors control over the integrity of their work and the right to be properly acknowledged and cited.
The Bethesda and Berlin statements put it this way: For a work to be OA, the copyright holder must consent in advance to let users “copy, use, distribute, transmit and display the work publicly and to make and distribute derivative works, in any digital medium for any responsible purpose, subject to proper attribution of authorship”.
All three tributaries of the mainstream BBB definition agree that OA removes both price and permission barriers. Free online access isn’t enough. “Fair use” (“fair dealing” in the UK) isn’t enough.
Having said all that, though, I’ll add that an explicit description of machine readability requirements would be an addition to the accepted definition of OA — and one that I would welcome. Peter Murray-Rust recently noted that, according to the “price and permission barriers” view of Open Access, PubMed isn’t OA — even PubMed Central isn’t OA.
I’ll go even further: can anyone point me to a single Open Access repository? I don’t know of even one such site that removes both price and permission barriers. Surely there must be some, but the Big Names (PubMed Central, arXiv, Cogprints, CiteSeer, RePEc, etc — see ROAR) don’t seem to qualify, because digital objects in these repositories carry their own copyrights, rather than being covered by a blanket license provided by the repository.
Can this be true? Five years after the BBB definition came together, more than ten years since Stevan Harnad’s subversive proposal and on the first day of the NIH mandate — widely referred to as an OA mandate! — can it be that we really don’t have a single truly OA repository in all the world? And if it is true, would it help to make the official definition more explicitly machine-friendly?
IMHO, NO: I think it would be a big strategic mistake if today, when the cupboards are still 85% bare, we were to start insisting that deposits must all be Cordon Bleu ****.
OA just means free online access to the full-text of refereed journal articles. Please let’s not risk getting less by needlessly insisting on more. The rest will come in due time, but what is urgently needed today, and what is still 85% overdue by more than 10 years today, is free online access. Let the Green OA mandates provide that, and the rest will all come naturally with the territory soon enough of its own accord.
But over-reach gratuitously now, and we will just delay the optimal and inevitable, already within our reach, still longer.
Ceterum Censeo: The BBB “definitions” (which were not brought down to us by Moses from On High, but puttered together by muddled mortals, including myself) are not etched in stone, and need some tweaking to get them right.
“Time to Update the BBB Definition of Open Access”
OA is free online access. With that comes, automatically, the individual capability of linking, reading, downloading, storing, printing off, and data-mining (locally).
The further “rights” for 3rd-party databases to data-mine and re-publish will come after universal Green OA mandates generate universal OA (free online access). But you’ll never get universal Green OA mandates if you insist in advance that the 3rd-party re-use rights must be part of the mandate! (Notice that the Harvard mandate has an opt-out, which means it’s not a mandate.)
“On Patience, and Letting (Human) Nature Take Its Course”
And as to demanding machine-readable XML from authors: 85% of authors cannot now be bothered to do even the few keystrokes it takes to get them to deposit the drafts they already have: Does this sound like a reasonable time to ask them to upgrade their drafts to Cordon Bleu XML?
American Scientist Open Access Forum