That’s the way you do it!

Via Peter Suber, I am delighted to find that Stuart Shieber has started a weblog, and even more delighted that in one of his first entries he has turned my long-ago author-side fees DOAJ hack into an actual, readily reproducible study:

Here are the results computed by my software, as of May 26, 2009:

Charges.......................951  (23.14%)
No charges....................2889 (70.29%)
Information missing...........270  (6.57%)
Hybrid........................1519 (26.99%)

The numbers are consistent with those of Hooker’s study some 16 months earlier.

It’s great to have the numbers confirmed, and even better to be able to make regular updates and construct time series. Thanks to Stuart for doing it right, and for making the code freely available.
(Note, had to reformat the quoted table into ugly text, because I still can’t get MT to play nice. Grrr.)

What use are research patents?

DrugMonkey has a conversation going about the ongoing kerfluffle over (micro)blogging of conference presentations (see also the FriendFeed discussion). I want to go off on a tangent from something that came up in his comment thread, so rather than derail it I thought I’d post here.
In his first comment in the thread, David Crotty made the following claim:

Lots of researchers support their families and labs through money generated by patents, and most universities are heavily dependent upon their patent portfolios for funding.

That doesn’t accord with my (limited!) experience — I know a few researchers who hold multiple patents, and none of them ever made any money that way — and my general impression is that the return on investment for tech transfer offices and the like is fairly dismal.
This seems like the sort of beans that beancounters everywhere should be counting, so I asked on FriendFeed whether anyone knew of any data to address the question of whether universities really make much money from patents. Christina Pikas pointed me to the Association of University Technology Managers, whose 2007 Licensing Activity Survey is now available.
I extracted data for 154 universities and 27 hospitals and research institutions. Between them, in 2007, these institutions filed 11116 patent applications, were awarded 3512 patents, and gave rise to 538 start-up companies. I calculated licensing income as a percentage of research expenditure:


Apart from New York University (I wonder what they own that’s so profitable?), it’s clear that none of these universities are “heavily dependent upon their patent portfolios for funding”. In fact, more than half of them (78/154) made less than 1% of their research expenditure back in licensing income, and the great majority (144/154) made less than 10%.
Licensing income for Massachusetts General Hospital and “City of Hope National Medical Ctr. & Beckman Research” (whoever they are) amounted to 65-70% of research expenditure, but none of the other hospitals or research institutions made more than 20%. More than half of this group (15/27) made less than 2%, and most of them (23/27) made less than 10%.
The distribution looks just about as you would expect:


I also wondered whether there was any evidence that greater numbers of patents awarded, or more money spent per patent, resulted in higher licensing income. As you can see, the answer is no (insets show the same plots with the circled outliers removed):


I don’t know how representative this dataset is; there are several thousand universities and colleges in the US, and surely even more hospitals and research institutions, so the sample size is relatively small. It does include some big names, though – Harvard, Johns Hopkins, MIT, Stanford, U of California — and I would expect a list of schools answering the AUTM survey to be weighted towards those schools with an emphasis on tech transfer.
In any case, I’m not buying David’s assertion that “most universities”, or most hospitals or research institutes for that matter, rely heavily on licensing income. And that being so, I am also somewhat skeptical about the number of researchers’ families being supported by patents.
What’s the Open Science connection? Well, if you’re interested in patenting the results of your research, there are a lot of restrictions on how you can disseminate your results. You can’t keep an Open Notebook, or upload unprotected work to a preprint server or publicly-searchable repository, or even in many cases talk about the IP-related parts of your work at conferences. It seems from the data above that most universities would not be losing much if they gave up chasing patents entirely; nor would they be risking much future income, since so few seem to get significant funds from licensing. My own feeling is that any real or potential losses would be much more than offset by the gains in opportunities for collaboration and full exploitation of research data that come with an Open approach.
1. Christina left a comment pointing out that patents may be required for more than simply making money from licensing:

…an extremely important reason universities patent [is] to protect their work so that they may exploit it for future research… it turns out that universities have to patent in life sciences – even if they don’t actively market and license these patents – to be able to attract new research money from industry.

There are two distinct points here: first, that if you don’t patent you may not attract industry partners, and second, that if you don’t patent you may end up licensing your own tech back from someone else (I note that most tech licenses I know of are cheap or free “for research purposes” so the latter factor might not weigh so heavily). According to the 2007 AUTM data, industry investment in academic research amounted to about 7% of research expenditure and was up 15% over 2006.
2. David responded on DM’s thread with some counter evidence, on reading which I realise that the data above may (likely?) only show what the university received and not any money that went to the labs or researchers involved. Tech transfer may not be financially worth it for the university, except that it might still be doing good things for individual labs and PIs, and so would constitute a support service the university offers its research community. It also strikes me that my experience, such as it is, is mainly with Australian researchers, whereas David’s is in the US, so cultural differences may also apply.
3. More from Christina at her own place, here.
If you want the data, the spreadsheet I used is here.

What happened to serials prices in 1986-87? (Update: probably nothing.)

This could be nothing but an artifact (e.g. of the way the data were collected), but if you look at Fig 1 from this post, there’s a clear break in the serials expenses (EXPSER) curve that’s not evident in any of the others. Here’s the same plot reworked to emphasize what I’m talking about:


If you squint just right you can imagine a similar but much weaker effect, beginning a year or two later, in the total expenditures (TOTEXP) curve; and the salaries (TOTSAL) curve seems to start a similar upward trend at about the same time but then levels off after 1991 or so. I wouldn’t put any weight on either of those observations though — I’d never have noticed either if I hadn’t been comparing carefully with the EXPSER curve.
I’ve added linear regression lines for the 1976-1986 and 1987-2003 sections of the EXPSER data, just to emphasize the change in rate of increase. For those of you who will twitch until they know, just ‘cos, the regression coefficients of the two lines are 0.99 and 0.98 respectively. If you extrapolate from just the 76-86 section, TOTEXP exceeds the forecast for EXPSER after about 2000.
I have no idea if this means anything, but it is tempting to speculate. For instance: when did the big mergers begin in Big Publishing, and when did the big publishing companies start the odious practice of “bundling”, that is, selling their subscriptions in packages so that libraries are forced to subscribe to journals they don’t want just to get the ones they do?

Update: it’s probably nothing; the curve simply shows an increasing rate of increase, and you can break it up into at least five reasonably convincing-looking segments with breaks at 86-87 and 94-95. It’s possible there were two “pricing events” around those times, but I think this is most likely just an illustration of what can happen when you look a little too hard for patterns in your data!


Every little bit counts.

There are so many good causes, and so many of them are not just good but urgent — even assuming you have some money to spare, where are you to donate it? Everyone has their own solution to this problem. Mine is to try to hedge my bets: donate roughly equally to long- and short-term, local and global, human and environmental. I’m out of work and thoroughly skint right now, but I try to remember that by world standards I’m still living like a king; my budget includes some “don’t go insane” funds for occasional movies or dinners out or whatever, and I can always skip one of those in order to give just a little to some good cause.
One such is the Open Knowledge Foundation, which is turning five and asking for support:

This month the Open Knowledge Foundation is five years old.

Over those last five years we’ve done much to promote open access to information — from sonnets to stats, genes to geodata — not only in the form of specific projects like Open Shakespeare and Public Domain Works but also in the creation of tools such as KnowledgeForge and the Comprehensive Knowledge Archive Network, standards such as the Open Knowledge Definition, and events such as OKCon, designed to benefit the wider open knowledge community. (More about what we’ve been up just over the last year can be found in our latest annual report).

While we have achieved a lot, we believe we can do much, much more. We are therefore reaching out to our community and asking you to help us take our vision further.

Our aim: at least a 100 supporters committed to making regular, ongoing donations of £5 (EUR 6, $7.50) or more a month.

These funds will be essential in expanding and sustaining our work by allowing us to invest in infrastructure and employ modest central support. To pledge yourself as one of those supporters all you need to do is take 30 seconds to sign up to our “100 supporters” pledge at:

And if you want to act on the pledge right now (or make any other kind of donation), please visit:

We are and will remain a not-for-profit organization, built on the work of passionate volunteers but these additional fund are essential in maintaining and extending our effort. Become a supporter and help us take our work forward!

I’m in no position to make a regular commitment, but I skipped a movie and sent ’em ten quid. It’s not much but it’s my hope that small donations can be a powerful force in the internet age. The other thing I can donate is publicity, which is what this post is for.
Why donate to OKF? My belief is that openness is not only our best weapon in the unending battle against bad actors and free riders, it is the key to a radically more efficient scientific process, which in turn is the key to all material progress in human quality of life.
The OKF not only builds tools and standards for open exchange of information, but they are also part of the front line effort to make openness and transparency into a constant, widely adopted habit of mind and of behaviour. To choose a topical example, we won’t have appropriate access to information about the spending habits of our elected officials until we are so in the habit of openness that it is a surprise and an affront to the average citizen to realise that such information is being kept secret. To choose my own bête noire as another example, we won’t be free of “data not shown” in the scientific literature until the majority of scientists respond to that phrase with an immediate and indignant “why the hell not?”.
So, support for the OKF is one of my long-term choices: an investment in a better future for everybody. If you have a couple of dollars to spare, please consider investing with me.

Pick an index, any index.

Over at The Scholarly Kitchen, Philip Davis takes the ARL to task for comparing their serials expenditures with the Consumer Price Index:

By adopting the CPI as a general frame of reference, almost any industry that requires huge professional worker input will look like it is spiraling out of control. Perhaps this is the reason the ARL uses the Consumer Price Index as a reference for journal prices when it could have used the Higher Education Price Index, the Producer Price Index, or an index which more closely resembles professional knowledge production.
The CPI is an excellent tool for collective salary bargaining, for estimating who should be eligible for food stamps or free school lunches. It is a very bad tool for measuring the purchasing power of libraries or justifying a reinvention of the journal publication system.

Since I’ve just played around with updating the famous graph to which Davis takes exception, I thought I’d better take a closer look at the alternative indices he suggests.
From the Commonfund 2008 HEPI Report (pdf; linked from here) I extracted historical HEPI and CPI data from 1976 to 2003, and from the ARL stats interface at U Virginia I extracted the median values for serials expenditures (EXPSER), total salaries expenditures (TOTSAL) and total expenditures (TOTEXP) for the same period (it was limitations in the ARL data range that dictated the time period). I also extracted Producer Price Index data for “all commodities” (PPI ALL) over the same period from the Bureau of Labor Statistics. There are lots of choices for PPI data, but most of them don’t go back as far as 1976. (I did try a couple of industries that I thought required “huge professional worker input”, such as hospitals and book publishers, but the data weren’t available for all the years I wanted — and by eyeball it didn’t look as though they showed much greater increase than the all commodities index.)
Plotting percent cumulative change against time we see:


There isn’t a lot of difference between the HEPI and the CPI, and the all commodities PPI index shows even less increase. Davis suggests that salaries, professional worker input, are at least part of the reason why the CPI is a poor choice for comparison with serials costs, but (to the extent that the HEPI is a better “professional worker weighted” measure) the data do not bear him out. Neither does his claim regarding librarian salaries fit the data I have to hand:

If we plotted academic librarian salaries against the CPI, we could claim that the profession was in crisis, that salary growth was unsustainable, and that the system was simply broken.

It’s clear from the data, though, that library salary expenditures have outstripped the HEPI and CPI, but not by as much as total expenses and not by nearly as much as serials costs.
Remember, too, that this is still only part of the story: “serials” includes a great many publications whose costs have not increased at the same rate as the scholarly literature. The Abridged Index Medicus data I got from EBSCO only cover 1990 onwards, so I reworked the comparison to include the AIM data:


I used the AIM data because comparison with a much larger data set, broken down by individual discipline, showed that the AIM data gave what looks like a reasonable “middle value” — and as you can see, scholarly journal price increases outstrip all others, including total serials, by a considerable margin.
Note also that there’s little difference between “total salaries” and “professional salaries” — the professional salary data series (SALPRF) only goes back to 1986, which is why I’ve included it in this second graph.
None of this is to say that the CPI is the ideal comparison index against which to measure increases in the cost of the scholarly literature. It seems from the comparisons above, though, that there’s not much difference for this particular purpose between the CPI and the HEPI. While I don’t have data for publishing industry salaries, library salaries hew fairly closely to the HEPI and to total library expenditures. It therefore doesn’t seem that salaries have much to do with the much-bruited discrepancy between “general cost of living/doing business/whatever” increases and the rise and rise of the cost of scholarly literature.
If you want the data I used, the spreadsheet is here.