Did Facebook screw up, or did I?

motherfucker.pngEarlier today I got a notification from someone on my Facebook friends list, leading to a “President Obama Approval Poll”. I voted, clicked away, and then a couple of hours later got an email to say I’d sent notifications of the same poll to all my friends list.
I am usually pretty careful about permissions for Facebook apps, and I did not notice any opt-out for the spamming of my friends. I honestly think this thing has its privacy settings wrong (deliberately?) and does not give the usual options, just goes ahead and spams your friend list.
Sorry to anyone who got a “notification” from me — either I got sloppy or, as I suspect, this thing is viral.

Perpetuating an OA myth

Maxine at Nautilus posted a slightly shortened version of this letter to Nature from Raf Aerts; what caught my eye was the rearing of a familiar ugly head (emphasis mine):

…the [global recession] may also be affecting the publication output of research institutions in a more subtle way. It could be boosting the traditional reader-pays publication model for scientific journals at the expense of the author-pays, or open-access, model.
Open-access journals ask authors to pay for processing their manuscripts (which involves organizing a form of quality control, formatting and distribution) so that the final product becomes freely available, and free to use if properly attributed. […]

This myth, that OA is synonymous with author-pays, is a toll-access publisher’s delight. It simply is not true. See here for detail; briefly:

  • in 2005, the Kaufman-Wills group showed that “…more than half of DOAJ [Open Access] journals did not charge author-side fees of any type, whereas more than 75% of ALPSP, AAMC, and HW subset [Toll Access] journals did charge author-side fees.” (Note that this study included only 248 journals from the DOAJ.)
  • in 2007, Peter Suber and Caroline Sutton showed that, of 450 OA journals published by 468 scholarly societies, only 75 — fewer than 20% — charged author-side fees
  • also in 2007, I showed that only 18% of the almost 3000 journals in the whole DOAJ charged author-side fees; 67% did not charge such fees, and the information was missing for 15%.
  • in March 2008, Heather Morrison showed that more than 90% of the psychology journals in the DOAJ charge no publication fee1
  • about a month ago, I showed that only 38 (42%) of the 90 full-OA chemistry journals in the DOAJ charged author-side fees (49% did not charge such fees, and information was missing for 9%).

Raf goes on to say:

…few peer-reviewed open-access journals have so far had a high impact factor in their field, except for a small number such as those published by the Public Library of Science and BioMed Central. They are therefore struggling to emerge and to attract the most prestigious research findings.
This situation could deteriorate further if open-access journals are forced to move to (partial) site licensing in order to cover their production costs — a shift recently undertaken by the Journal of Visualized Experiments, for example — as authors become increasingly reluctant or unable to pay in the current financial climate.

I don’t see why we should assume that anything will “deteriorate” if OA journals switch to new funding models, or that OA journals will have a harder time ’emerging’ if they move to a model that is actually closer to the old, familiar toll-access model. After all, there already exist a wide variety of ways in which OA publications pay the bills: advertising, endowments, philanthropy, institutional subsidies, memberships, priced editions and more. In particular, hybrid journals (which is what JoVE has become) are popular with toll-access publishers as a way to establish a foothold in OA territory. Inter alia, Elsevier, Springer and Wiley all publish hybrid journals, and between them, those three account for more than 40% of the worldwide science/tech/medicine publishing market — so the hybrid model is pretty well established.
There’s more to say about authors’ willingness and/or ability to pay, too. Firstly, it’s almost never the author who pays, but the funding body paying for that author’s research. At the moment, this can translate into using up precious grant money when there’s a need to pay author-side fees, but with 77 funder, institutional and departmental OA mandates in place and more on the way, it seems reasonable to suppose that more and more of the mandating bodies will underwrite more and more of the costs of publishing. For example, HHMI has institutional agreements/memberships with BMC, Springer and Elsevier, and BMC’s page of funder policies shows that a majority of UK funders either make additional funds available or allow publication charges to be treated as an indirect cost. Many OA journals also waive or reduce their fees on application; for instance, here are the PLoS (scroll down) and BMC policies.
Finally, remember that the Kaufman-Wills study showed that 75% of the toll-access journals surveyed charged author-side fees (page charges, colour charges, reprint charges, etc) in addition to their subscription charges. So when there are author-side fees involved, I’d like to know how those charged by OA journals (in return for which the work is freely available to everyone, forever) compare with those charged by toll-access journals (in return for which, authors often cannot retrieve their own work, and anyone who wants to read it must pay another fee).

1 updated 04/29 after reading this post from Peter Suber

Caste in America (or: hell in a handbasket, yes indeedy)

I don’t spend much time writing about politics any more — my mental health just can’t take it. But, data!
Via 3QuarksDaily: the Office of Management and Budget has a blog, to which director Peter Orszag posted an entry on “The Case for Reform in Education and Health Care“. He describes a talk he gave to the Association of American Universities, and makes his slides available as a pdf. From those slides:
Whether you even start college depends as much on your family’s income as on your ability (insofar as math scores are a decent proxy for such ability). For instance, if you’re an average student (middle third math scores) you are about twice as likely to go to college if your family earned in the highest bracket, relative to your chances if your family earned in the lowest bracket. Similarly, if you’re in one of the two lowest income brackets, you can roughly double your chances of going to college by getting your math score up from middle to highest third.

enrollment.png


If you do start college, whether or not you graduate also has a lot to do with family income: almost half of the students from the lowest income background do not finish college, whereas the noncompletion rate drops to less than 25% in the highest family income bracket.

completion.png


There is a vicious circle in operation: relative to a high school education, a college education returns a premium of over 400%, making you that much more likely to contribute to your children’s success, as shown above. (The ordinate shows the log of the ratio between the return to a college education versus the return to a high school education: 10^0.6 is about 4.)

wage premium.png


The vicious circle encompasses more than just school. If you have money, you’re more likely to be insured and to have more formal education; both factors make you much more likely to take part in routine health screens, which in turn makes you more likely to stay healthy, which in turn keeps your earning potential up, and so on.

sick.png


gettingahead2.pngIn a similar vein, Ryan Avent adds this figure from the Pew Charitable Trusts’ Economic Mobility Project, which shows that you’re more likely to wind up in the top earning quintile if your parents were in that demographic but you didn’t go to college, than if you did go to college but your parents were in the bottom quintile (click the image at right for a popup or go here):

The rich get richer, the poor get the picture, but Garrett was wrong about one thing: when you’re down so low, that’s right where the bombs are most likely to land. Here’s a little Vonnegut to take us to the news at the top of the hour:

America is the wealthiest nation on Earth, but its people are mainly poor, and poor Americans are urged to hate themselves. To quote the American humorist Kin Hubbard, “It ain’t no disgrace to be poor, but it might as well be.” It is in fact a crime for an American to be poor, even though America is a nation of poor. Every other nation has folk traditions of men who were poor but extremely wise and virtuous, and therefore more estimable than anyone with power and gold. No such tales are told by the American poor. They mock themselves and glorify their betters. The meanest eating or drinking establishment, owned by a man who is himself poor, is very likely to have a sign on its wall asking this cruel question: “If you’re so smart, why ain’t you rich?” […]
Americans, like human beings everywhere, believe many things that are obviously untrue… Their most destructive untruth is that it is very easy for any American to make money. They will not acknowledge how in fact hard money is to come by, and, therefore, those who have no money blame and blame and blame themselves. This inward blame has been a treasure for the rich and powerful, who have had to do less for their poor, publicly and privately, than any other ruling class since, say, Napoleonic times.

word cloud CVs for dummies

Pierre and Pawel both did amazing things with word clouds for their CVs, using all kinds of black magic programming skills that I don’t have. Just for fun, I thought I’d see what the version looked like that any doofus could create. I made a list of all the jobs I’ve had, then listed all the methods I used in each job — making sure to call the same method by the same name each time it came up, so as to provide a basic weighting for the elements in the word cloud.
Here’s what Wordle made of the resulting list:

wordleme.png

It’s not horrible, though I can already see things I forgot to put in, and I do wish Wordle would keep phrases together1. I guess you could also try doing this with the texts of your published papers, or just the abstracts, or just the Materials and Methods.

1Update: thanks to Piotr, who left me a comment pointing out that Wordle can indeed keep phrases together, here’s an alternative version; now that I see it with phrases intact I’m not sure which is better:
wordleme2.png


(Wordle settings for both versions: language: remove numbers, leave as spelled, remove common English words; font: Telephoto, layout: straighter edges, horizontal; color: Wordly, a little variance.)

out of season

I’ve been meaning to post more verse — my own, and other people’s. Blame Jason for reminding me of this one, even though it’s the wrong end of the season (and I’m nowhere near a campus these days):

Autumn Song for Alec
or, Who’s that dirty old man leering out of the Chem Dept window?
Summer slowly fades away,
campus flowers even brighter:
hectic striplings seize the day,
shorts get shorter, tops crop tighter;
one last blaze of youth and skin,
refracted in the cooling prism
of early autumn; let’s hope win-
ter eases Alec’s priapism!

(How you doin’ Alec? I hope you’re OK.)

That bloody video.

still.pngThis video annoyed me the first time I saw it, but I just figured, you know, not everything is made for me. Now it seems to be making another round of the social media stream; it ended up on my radar via FriendFeed, and this time I just had to say something.
First of all, that’s five minutes you’ll never get back. Five minutes isn’t much, but when you only have 30 or 60 minutes a day to spend online — as, e.g., I did in my last job — you resent every stolen second. This is why I hate, with a fierce and curmudgeonly hate, multimedia without transcripts or text versions.
Secondly, here’s the content — in a form you can use at your own pace without needing pause and fast forward buttons:

  • if you’re 1 in a million in China, there are 1300 people just like you
  • China will soon become the number 1 English speaking country in the world
  • the 25% of India’s population with the highest IQ’s is greater than the total population of the United States
  • translation: India has more honors kids than America has kids
  • the top 10 in-demand jobs in 2010 did not exist in 2004
  • we are currently preparing students for jobs that don’t yet exist, using technologies that haven’t been invented, in order to solve problems we don’t even know are problems yet
  • US Dept of Labor estimates that today’s learner will have 10-14 jobs by the age of 38
  • 1 in 4 workers have been with their current employer less than a year; 1 in 2 have been there less than five years
  • 1 in 8 couples married in the US last year met online
  • if MySpace were a country, its 200 million registered users would make it the 5th largest in the world, between Indonesia and Brazil
  • the #1 ranked country in broadband internet penetration is Bermuda; #19 the US; #22 Japan
  • we are living in exponential times
  • Google searches: 2008, 31 billion/month; 2006, 2.7 billion/month
  • to whom were these questions addressed Before Google?
  • the first commercial text message was sent in Dec 1992; today, the number of text messages sent and received every day exceeds the total population of the planet
  • years it took to reach a market audience of 50 million: radio 38 years; television 13 years; internet 4 years; iPod 3 years; facebook 2 years.
  • in 1984 there were 1,000 internet devices, in 1992 there were 1,000,000, in 2008 there were 1,000,000,000
  • there are about 540,000 words in the English language, 5 X as many as in Shakespeare’s time
  • it is estimated that a week’s worth of the NY Times contains more information than a person was likely to come across in a lifetime in the 18th century
  • it is estimated that 4 exabytes (4×10^19 bytes) of unique information will be generated this year — more than in the previous 5,000 years
  • the amount of new technical information is doubling every 2 years; for students in a 4-year degree this means that half of what they learn in their first year of study will be outdated by their third year
  • NTT Japan has successfully tested a fiber optic cable that pushes 14 trillion bits/second down a single strand of fiber — that is 2,660 CDs or 210 million phone calls every second
  • it is currently tripling every 6 months and expected to do so for the next 20 years
  • by 2013, a supercomputer will be built that exceeds the computational capabilities of the human brain
  • predictions are that by 2049, a $1000 computer will exceed the computational capabilities of the entire human species
  • during the course of this presentation (4:55), 67 babies were born in the US, 274 were born in China, 395 were born in India and 694,000 songs were downloaded illegally
  • credit: Karl Fisch, Scott McLeod, and Jeff Brenman

When you see it like that, not zooming out at you with a soundtrack and a bunch of twee effects, it becomes obvious that there’s nothing much there, and what there is, is rather disjointed and incoherent. Many of the factoids look shaky to me, and there are only a couple of references or sources provided (why not provide the others?). I’m not going to bother with a fisking, but here are some obvious eyebrow-raisers:

  • All that stuff about China and India smacks of xenophobic scaremongering to me — I very much doubt that’s the intent, but there’s nothing to tie it to the technological stuff, so it starts to sound like “flee, the brown people are coming!”
  • “We are currently preparing…” — feels good means nothing; it’s just an overblown description of what good teachers have always done.
  • “We are living in exponential times” —  that word (“exponential”), I don’t think it means what you think it means…
  • OK, the google searches, text messages and years-to-50-million stuff is neat, though I still want sources.
  • The prefix exa- denotes 10^18; even using the unofficial binary-base interpretation, 4 exabytes is about 4.61 x 10^18 bytes (See what I did there, with the links to my sources? In a slideshow, you can do that with footnotes and a final slide.)
  • In any case, 4 or 40 exabytes of what? How do you define/count “unique information”?
  • Even if we gloss over “unique information”, how do any of the other quoted rates of change square with “more than in the previous 5,000 years”? What would that mean for the following 1/5000th of a year (~1.75 hours)? In other words, we must have maxed out — right?
  • If the optical fiber example needs a human-scale yardstick, so does 4 exabytes –e.g. if you wrote that data to CD-ROM and covered a football field with the discs, the resulting stack would be about 16 m high, or roughly the height of a four story house.

Update, written after all of the above:
It’s important to note that although the version discussed above is the only one I’d ever seen before today, it is actually the third version on YouTube and was “remixed” by Sony BMG in August 2008. The original was made by Karl Fisch in August 2006; Scott McLeod’s version dates to January 2007 (this was the first one to make it to YouTube and was responsible for the first viral wave); Jeff Brenman created a SlideShare version a couple of months later, and the official version 2.0 was made in consultation with XPLANE in June 2007.
In fairness to Fisch (sounds like a PETA chant), many of the shortcomings of the version that so annoyed me must be laid at the feet of the anonymous Sony drone responsible for the “remix”.
Not only did Fisch provide a text version and a list of his sources with version 1.0, but version 2.0 does a better job than the Sony version of acknowledging the sources in the course of the presentation and even comes with its own wiki, mentioned in the presentation. Version 2.0 is also considerably more coherent and much nicer to look at, and does a (somewhat) better job of avoiding the “eek, brown people!” tone. (Fisch says in a couple of places that he and McLeod, in response to criticism, consciously worked to reduce that “us vs them” feeling, and points out here that he views it as largely an unforseen side-effect of some of the changes between his original powerpoint version, made for his immediate colleagues, and the first YouTube version.) Finally, kudos for choosing a Creative Commons license (even though I don’t like copyleft): although the Sony version leaves this out, all versions are CC-BY-NC-SA (source files are available on the wiki).
In my opinion it’s a damn shame that the Sony version took off (at the time of writing, there are two copies on YouTube with 4,458,229 and 29,828 views, respectively). If you come across someone talking about that version, do everyone a favour and point them to version 2.0.

Scholarly (scientific) journals vs total serials: % price increase 1990-2009

Following on from this post, I manually extracted historical data for average scholarly journal prices in a dozen broad disciplines from the Library Journal Annual Periodicals Price Surveys by Lee Van Orsdel and Kathleen Born, and compared these with three datasets from the earlier post: ARL libraries’ median total serials expenditures (ARL all serials), Abridged Index Medicus average journal price (AIM) and the consumer price index (CPI):


LJ.png

My concern with the AIM dataset was that it was too small and specialized to support broad conclusions, but it turns out that the AIM data sit somewhere in the middle of the disciplines analysed. Astronomy is closest to the ARL all serials median, with math and computer science not much worse; general science is the worst offender, with engineering and technology, chemistry and food science not far behind. From 1990 to 2008, total price increases ranged from 238% (astronomy) to 537% (general science); that’s 3.7 and 8.3 times the increase in the CPI, respectively.
This dataset covers an average of around 3600 journals from 2005-2009, 3255 from 1997-2001 and 2655 from 1989-1990. I think this represents good evidence that historical price data for total serials, even though it shows a rate of increase far greater than that of the CPI, masks an even greater rate of increase among scholarly (scientific) journals. It’s difficult to look at that graph and believe that scholarly publishers are playing fair, particularly when one remembers that online publishing, with its attendant cost reductions, came of age during the same period of time.
The Van Orsdel/Born surveys include a number of other scholarly disciplines (art, architecture, business, history, language, law, music, etc etc). If I have the time I’ll work those up as well, to provide as broad a picture as possible. I should also include numbers of titles in each discipline, to give some idea of total influence. For instance: although general science (around 60 or 70 titles) shows the greatest increase, it likely contributes far less to the serials crisis than health sciences (more than 1500 titles).
(The data are available in this Excel spreadsheet.)

Some wishes come true.

A while back, I posted about my discovery (new to me, though not new to many others) that the serials crisis should probably be called something like the “scholarly journals crisis”. The term “serials” includes a wide range of publications, most of which are not peer-reviewed scholarly journals — newspapers, goverment reports issued in series, yearbooks, magazines and more. Only about 1/10 of the serials in Ulrich’s directory are peer-reviewed. The average scholarly journal costs around 10 times as much as the average serial, and while the cost of the scholarly literature continues to climb, median serial unit costs at ARL libraries have actually been falling for the last seven or eight years (Fig 1 below). It therefore appears that scholarly journals are the driving force behind the serials crisis.
At the time, I wished that I had some specific data to show the difference between scholarly and average serials — hence the title of this post: via medinfo, I learned that EBSCO Information Services has released a brief report (pdf!) on the price history of well regarded clinical journals, using 117 titles from the NLM’s Abridged Index Medicus (AIM). This is a curated list of biomed journals “of immediate interest to the practicing physician” and can be searched on PubMed as a subset limit named “core clinical journals”.
As a reminder, here’s that graph; it’s from the ARL stats report from 2004-5 and the reason it’s famous is the way that “Serials Expenditures” outstrips the Consumer Price Index (CPI) and other measures:

ARL.png


Here’s a comparison of that data with the price history of the AIM journals; the line labeled “expser/ARL libraries all serials” shows the 1990-2005 subset of the “Serials Expenditures” data from Fig 1, and “EBSCO/core clinical journals” shows the AIM data:

EBSCO.png

Data labels (ARL data from here):

  • serpur: Current Serials Purchased, median value from all ARL libraries
  • expser: Expenditures for Serials, median etc
  • totsal: Total Salaries & Wages, median etc
  • serunit: Serial Unit Cost; median value of expsur/serpur calculated for all ARL libraries
  • EBSCO: average price per journal in the Abridged Index Medicus set
  • CPI-U: Consumer Price Index, all urban consumers, annual average, not seasonally adjusted

This is exactly what I wished for, hard evidence of the difference between scholarly and average serials; and what that evidence strongly indicates is that price increases in scholarly journals are driving the serials crisis. Scholarly journals far outstrip total serials in terms of annual price increase, even though the latter shows a much more rapid increase than the CPI. In contrast, library salary expenditure follows the CPI closely, and median serial unit cost (all serials) has been dropping slowly since 2000.
Frankly, I’m tempted to name this the Big Fat Ripoff Graph. Between 1990 and 2008, the CPI increased by about 65%, whereas over the same period the average price of an AIM journal increased by 415%, a 6.4-fold difference. I’ve seen publishers try to defend the “total serials expenditures” vs CPI discrepancy by pointing out that journals are proliferating — indeed, the “serials purchased” curve is headed upwards at an increasing rate, particularly over the last five years or so. But that defense is no good against the BFR Graph, on which the most damning curve shows average journal prices. I’ve also seen comments to the effect that if mean or median serial unit costs are dropping, publishers must be offering increasing value for money even if they are charging more in total. That might be true of the set of “all serials publishers”, but it’s apparent from the BFR Graph that scholarly journal publishers can make no such claim.
It must be remembered, of course, that we are only looking at a little over a hundred clinical journals here, a small and discipline specific subset. Nonetheless, the result is so striking that I think it is a considerable inducement to the gathering of more data. Since it seems my wishes for more work are coming true, I’ll make another: now I want price history data for other, larger journal subsets in other scholarly disciplines. I wonder what the BFR Graph looks like for those datasets?
(P.S. If you want the numbers I used, or to check my work, the spreadsheet is here.)

Update: ha! I just got around to reading this article, linked by Peter Suber a couple of days ago; turns out it’s full of annual price data, and Van Orsdel and Born have been doing these surveys for at least ten years. There doesn’t seem to be a central collection or data collation, so I’ll have to piece it together. Stay tuned!

What’s wrong with copyleft?

This FriendFeed thread regarding the Wikipedia licensing vote has stirred up an old hornet’s nest of issues surrounding copyleft and noncommercial clauses in Open licenses. As I said in the thread, I get most of my ideas on this topic from David Wiley, and have posted about those ideas before. Herewith another attempt to organize and clarify my thoughts, as much for my own benefit as anything:
1. The purpose of Open licensing is to enable the following (this is straight from David’s Open Education License draft, about which more later):

  • Reuse – Use the work verbatim, just exactly as you found it
  • Rework – Alter or transform the work so that it better meets your needs
  • Remix – Combine the (verbatim or altered) work with other works to better meet your needs
  • Redistribute – Share the verbatim work, the reworked work, or the remixed work with others

2. The purpose of restrictive clauses in such licensing is to prevent specific types of reuse, rework, remix and/or redistribution:

2a. Copyleft prevents future copyright lockup by requiring that all downstream (reworked or remixed) works be similarly licensed.
2b. Noncommercial clauses prevent profitmaking, and are complicated, and I’m not getting any further into it than that right now. (Maybe later, if my brain doesn’t melt.)

3. Although copyleft and NC clauses achieve their own immediate goals, widespread license incompatibility1 means that they often (perhaps usually) defeat part of the larger purpose of Open licensing. The use case where this is most prominent is remix2, since reuse and redistribution of individual copylefted or NC-licensed works or their derivatives is usually just a matter of retaining the original license. But multiple works can only be recombined into new works if their respective licenses are compatible — otherwise, there’s no licensing option for the remix that doesn’t violate the licensing terms of at least one of the ingredients. Not only that, but if any of the works in the mix carries a copyleft license, that license takes over the entire remix and everything downstream of it, thus propagating the incompatibility problem.
4. One last thing: could copyleft be saved from itself? What if someone wanted copyleft protection, without the compatibility issues? Creative Commons is already beginning to build the only solution I can think of: widespread interoperability agreements between existing and any newly developed copyleft licenses. CC-BY-SA 3.0 contains the following clause:

You may distribute, publicly display, publicly perform, or publicly digitally perform a Derivative Work only under: (i) the terms of this License; (ii) a later version of this License with the same License Elements as this License; (iii) either the Creative Commons (Unported) license or a Creative Commons jurisdiction license (either this or a later license version) that contains the same License Elements as this License (e.g. Attribution-ShareAlike 3.0 (Unported)); (iv) a Creative Commons Compatible License.

where (iv) is defined as

a license that is listed at http://creativecommons.org/compatiblelicenses

Sadly, the cupboard remains bare so far:

Please note that to date, Creative Commons has not approved any licenses for compatibility; however, we are hopeful that we may be able to do so in the future. If you would like to discuss the possible compatibility of your license with a Creative Commons license, please email us at info@creativecommons.org.

I am personally persuaded that the Public Domain is the best way out of the copyleft trap, which is why I use CCZero for everything I make.

————-
1 Among CC licenses, there is only about 33% compatibility, and that drops to 20% among NC and SA versions — including self-compatibility*:

cccompatibility.png

Restrictive (NC, SA) versions currently account for around 80% of worldwide CC licence uptake. Once you start factoring in the dozens and dozens of other Open/Free licenses out there, it only gets worse. The FSF and OSI maintain lists of licenses and compatibilities (here and here, respectively), and wikipedia includes a couple of fairly extensive comparison tables. Speaking of Wikipedia, the world’s favourite online encyclopaedia is currently released under the GNU Free Documentation License, which is not compatible with any CC license except Public Domain though it does allow transition to CC-BY-SA. If the current vote on that transition is “yes”, that will be a step forward — but it will still leave Wikipedia with the compatibility problems shown in the figure above. Exploration of compatibility issues with all the other Free/Open licenses is left as an exercise, etc.
* from here and here; green indicates compatibility, light green indicates possible compatibility — some disagreement between sources.

2This is why I consider David’s “Four R’s” formulation so important, because it makes a clear distinction between rework and remix that is essential to understanding the aims and implementation of Open licenses.

Anniversary of sorts

This question from Antony Williams on FriendFeed:

Is PubChem Data Open or not? There are many discussions saying that PubChem data are Open but I see PubChem as a host and the disclaimer does not say “open”: http://tinyurl.com/e78as

reminded me that it’s almost a year to the day since Egon Willighagen asked a similar question about PubMed Central content:

I was wondering about this section in the CC license of much of the PMC content, such as our paper on userscripts (section 4a of the CC-BY 2.0):

    You may not distribute, publicly display, publicly perform, or publicly digitally perform the Work with any technological measures that control access or use of the Work in a manner inconsistent with the terms of this License Agreement.

CC-BY 3.0 reads differently, but has similar aims. […] Peter [Murray-Rust, see here] indicates that the NIH has put in place ‘technological measures to control access’ to the distribution of our work on userscripts (the PMC entry). That is in clear violation of the CC license. […] What the PMC website should indicate, instead, is that text mining is allowed for the PMC OAI subset, but that they would highly prefer to use the PMC OAI or PMC FTP routes. This is the least they have to do.

No matter what, I still have the feeling that any technical obstacles are disallowed by the CC-license. Any legal expert here, that can explain me if the CC license allows controlling how people have access to my material?

These are both very good questions, and I still don’t have an answer for Egon’s even after a year. I’m reluctant to go pestering John Wilbanks with every CC-related question I come across, so I’m reposting in the hope that someone will be able to save John from me.