An Open Access partisan’s view of “Electronic Publication and the Narrowing of Science and Scholarship”

There’s been a good deal of online chatter about this recent Science article that discusses the effects of online access on scholarship — see, e.g., discussions here and here and blog entries noted therein.  The report is not available without paying a toll or subscription, but the abstract is freely visible:

Online journals promise to serve more information to more dispersed audiences and are more efficiently searched and recalled. But because they are used differently than print — scientists and scholars tend to search electronically and follow hyperlinks rather than browse or peruse — electronically available journals may portend an ironic change for science. Using a database of 34 million articles, their citations (1945 to 2005), and online availability (1998 to 2005), I show that as more journal issues came online, the articles referenced tended to be more recent, fewer journals and articles were cited, and more of those citations were to fewer journals and articles. The forced browsing of print archives may have stretched scientists and scholars to anchor findings deeply into past and present scholarship. Searching online is more efficient and following hyperlinks quickly puts researchers in touch with prevailing opinion, but this may accelerate consensus and narrow the range of findings and ideas built upon.

This seems thoroughly counter-intuitive to me, since I find a good deal more information by direct search now that I can do it online, and browsing has never played a significant role in my literature searching.  (And remember, I’m old — I started out using Index Medicus!)  Who has time to browse probably-irrelevant journals and tables of contents on the offchance that something might be useful?  I’m far more likely to stumble across things I’d never have otherwise found when I’m relying on a variety of relevance-based search algorithms (PubMed’s Related Articles, Google Scholar, NextBio, etc.).
For anyone who thinks that “forced browsing of print archives” makes a lick of sense: we’ll pick a topic, then you spend a day or two browsing in meatspace, and I’ll spend an hour searching online.  Who do you think is likely to come up with the best (most useful, most comprehensive) set of references?
Moreover, the article’s conclusions seem to be based on a couple of unspoken assumptions with which I don’t agree.
The first is that citing more and older references is somehow better — that bit about “anchor[ing] findings deeply intro past and present scholarship”.  I don’t buy it.  Anyone who wants to read deeply into the past of a field can follow the citation trail back from more recent references, and there’s no point cluttering up every paper with every single reference back to Aristotle.  As you go further back there are more errors, mistaken models, lack of information, technical difficulties overcome in later work, and so on — and that’s how it’s supposed to work.  I’m not saying that it’s not worth reading way back in the archives, or that you don’t sometimes find overlooked ideas or observations there, but I am saying that it’s not something you want to spend most of your time doing.
Secondly, let’s take the author at his word:

I show that as more journal issues came online, the articles referenced tended to be more recent, fewer journals and articles were cited, and more of those citations were to fewer journals and articles.

OK, suppose you do show that — it’s only a bad thing if you assume that the authors who are citing fewer and more recent articles are somehow ignorant of the earlier work.  They’re not: as I said, later work builds on earlier.  Evans makes no attempt to demonstrate that there is a break in the citation trail — that these authors who are citing fewer and more recent articles are in any way missing something relevant.  Rather, I’d say they’re simply citing what they need to get their point across, and leaving readers who want to cast a wider net to do that for themselves (which, of course, they can do much more rapidly and thoroughly now that they can do it online).
If that means citing fewer articles now than researchers tended to cite 20 years ago, it probably has more to do with changes in the culture of science than in the electronic availability of research papers.  For instance, I think it far more likely — to exaggerate, for the purposes of illustration, in the opposite direction to Evans — that earlier authors, unable to rapidly and comprehensively scan the literature, cited everything they could get their hands on, padding their bibliographies well beyond anything useful in an attempt to lend weight to their arguments.
It’s potentially worrisome if more citations are going to fewer journals, but once again I see no more reason to attribute that to increasing online availability than to attribute it to the sharply rising cost of scientific journals in any form.  It’s well documented that as journal prices have continued to rise, researchers and institutions have had to cut back on the number of subscriptions they take.  It is not difficult to imagine that “long tail” and “preferential attachment” phenomena (see, for instance, Evans’ own references 14 – 18, reproduced below) would drive the concentration of likely subscriptions towards a pool of “must have” journals.  Indeed, publishers actively promote the concept of such a pool and compete strongly to be seen to be part of it.
Finally, and to me most importantly, Evans seems to me to gloss over the question of what proportion of the online archives are freely available, and what effect that has on the phenomenon he is attempting to model.  Here’s the crux of what he does say (fair use! fair use!):


I’ve rearranged the figure so that what were left, middle and right panels are now top, center and bottom panels; in all graphs the abscissae are “Years of journal issues online” and the ordinates are “Herfindahl citation concentration”, which is explained as follows:

A concentration of 1 indicates that every citation to [a given] journal [or subfield] in a given year is to a single article; a concentration just less than 1 suggests a high proportion of citations pointing to just a few articles; and a concentration approaching zero implies that citations reach out evenly to a large number of articles.

Here’s Evans’ interpretation of that data:

Figure 2C illustrates the concurrent influence of commercial and free online provision on the concentration of citations to particular articles and journals. The left panel shows that the number of years of commercial availability appears to significantly increase concentration of citations to fewer articles within a journal. If an additional 10 years of journal issues were to go online via any commercial source, the model predicts that its citation concentration would rise from 0.088 to 0.105, an increase of nearly 20%. Free electronic availability had a slight negative effect on the concentration of articles cited within journals, but it had a marginally positive effect on the concentration of articles cited within subfields (middle panel) and appeared to substantially drive up the concentration of citations to central journals within subfields (right panel). Commercial provision had a consistent positive effect on citation concentration in both articles and journals. The collective similarity between commercial and free access for all models discussed suggests that online access — whatever its source — reshapes knowledge discovery and use in the same way.

Wait, what?  Let me unpack that with a rewrite from my point of view:

The number of years of commercial availability appears to significantly increase concentration of citations to fewer articles within a journal, whereas free electronic availability had a negative effect on the concentration of articles cited within journals. If an additional 10 years of journal issues were to go online via any commercial source, the model predicts that its citation concentration would rise from 0.088 to 0.105, an increase of nearly 20%. In contrast, if an additional 10 years of journal issues were to go online via any free source, the model predicts that its citation concentration would drop from 0.088 to just under 0.08 [I had to estimate this by eye, since the data are not available], a decrease of around 10%. Similarly, free electronic availability had only a marginally positive effect on the concentration of articles cited within subfields. Only when considering concentration to journals within a subfield did free availability cause a substantial increase, and even then this effect was considerably less than that driven by commercial availability, which had a consistent positive effect on citation concentration in both articles and journals.

In other words, I take issue with the final sentence of the paragraph I quoted: commercial and free access do not show “collective similarity”.  On one of three measures they have the opposite effect, and on the other two measures commercial access has by far the stronger effect.
What this suggests to me is that the driving force in Evans’ suggested “narrow[ing of] the range of findings and ideas built upon” is not online access per se but in fact commercial access, with its attendant question of who can afford to read what.  Evans’ own data indicate that if the online access in question is free of charge, the apparent narrowing effect is significantly reduced or even reversed.  Moreover, the commercially available corpus is and has always been much larger than the freely available body of knowledge (for instance, DOAJ currently lists around 3500 journals, approximately 10-15% of the total number of scholarly journals).  This indicates that if all of the online access that went into Evans’ model had been free all along, the anti-narrowing effect of Open Access would be considerably amplified.
In fact, the comparison between print and online access is barely even possible when considering Open Access information.  The same considerations of cost — who can afford to read what — apply to commercial print and online publications, but free online information has essentially no print ancestor or equivalent.  Few if any scholarly journals were ever free in print, so there’s a huge difference between conversion from commercial print to commercial online on the one hand, and from commercial print to Open Access on the other.
Indeed, I would suggest that if the entire body of scholarly literature were Openly available, so that every researcher could read everything they could find and programmers were free to build search algorithms over a comprehensive database to help the researchers do that finding, then in fact the opposite effect would obtain.  Perhaps it’s true that the more commercial online access you have, the less widely a researcher’s literature search net is cast, but as I mentioned above I see no reason to attribute that more to the mode of access than to its cost.
In support of this assertion, consider the expanding body of literature on the Open Access “citation advantage” — studies which show that the likelihood of a given paper being cited is increased up to several hundred percent if the paper is OA rather than commercially available.  There is some controversy over that literature, but it stands in direct contrast to the idea that online access of any kind tends to narrow citation reach.
There are more data in Evans’ paper that speak to the free-vs-commercial issue, and some of those data show free access having a stronger “narrowing” effect than commercial access.  I’d go through it in detail, but I am probably already pushing the limits of fair use so I’ll have to refer you to the published article — in particular, Figure 2 panels A and B.  My response is much the same, that the apparent effect suffers from a loading in “favour” of commercial access, because of the wildly disparate sizes of the two different bodies of online literature. 

Lie down with pit bulls, wake up with a blogospheric flea in your ear.

This clumsy hatchet job from Nature reporter Declan Butler is beneath him, a poor excuse for journalism and an affront to the respect with which many of his colleagues are regarded by the research community.
Let’s start with the title: “PLoS stays afloat with bulk publishing”. Loaded rhetoric, anyone? The clear implications are that PLoS is floundering (Butler’s own numbers show otherwise!), and that “bulk” is somehow inferior (to, one presumes, “boutique” or some such). PLoS is “following an haute couture model of science publishing” sniffs our correspondant, who goes on to clarify: “relying on bulk, cheap publishing of lower quality papers to subsidize its handful of high-quality flagship journals”.
This emphasis on “quality” and the idea that the same somehow equates with scarcity continues throughout: “the company consciously decided to subsidize its top-tier titles by publishing second-tier community journals with high acceptance rates”, “the flood of articles appearing in PLoS One (sic)”, “difficult to judge the overall quality”, “because of this volume, it’s going to be considered a dumping ground”, “introduces a sub-standard journal to their mix”.
The intent is obvious, and the illogic is boggling. Where does Butler think the majority of science is published? Even if you buy into this nebulous idea of “quality” (one knows it when one sees it, does one not old chap? wot wot?) there can be no “great brand” journals without the denim-clad proletarian masses. All the painstaking, unspectacular groundwork for those big flashy headline-grabbing (and, dare I say it, all too often retracted) Nature front-pagers has got to go somewhere.
It gets much worse, though, when we get some measure of what Butler thinks “quality” means:

Papers submitted to PLoS One (sic) are sent to a member of its editorial board of around 500 researchers, who may opt to review it themselves or send it to their choice of referee. But referees only check for serious methodological flaws, and not the importance of the result.

That, along with an earlier remark about “a system of ‘light’ peer review”, is a blatant and serious misrepresentation of PLoS ONE’s review process. Here’s the actual policy:

The peer review of each article concentrates on objective and technical concerns to determine whether the research has been sufficiently well conceived, well executed, and well described to justify inclusion in the scientific record. […]
Unlike many journals which attempt to use the peer review process to determine whether or not an article reaches the level of ‘importance’ required by a given journal, PLoS ONE uses peer review to determine whether a paper is technically sound and worthy of inclusion in the published scientific record. […]
To be considered for publication in PLoS ONE, any given manuscript must satisfy the following criteria:

  • Content must report on original research (in any scientific discipline).
  • Results reported have not been published elsewhere.
  • Experiments, statistics, and other analyses are performed to a high technical standard.
  • Conclusions are presented in an appropriate fashionand supported by the text.
  • Techniques used have been documented in sufficient detail to allow replication.
  • Reports are presented in an intelligible fashion and written in standard English.
  • Research meets all applicable standards, including the Helsinki Declaration, with regard to the ethics of human and animal experimentation, consent, and research integrity.
  • Report adheres to the relevant community standards for research, reporting, and deposition of data. (Standards PLoS promotes across its journals).

Which is to say that PLoS ONE* holds authors to exactly the same scientific standards that every journal should follow. Which is to say that any methodological flaws, not “only… serious” ones, will see a paper revised, or rejected if the flaws can’t be overcome. Which is to say that PLoS ONE uses peer review to do what it was designed to do, not to create an artificial scarcity from which to milk profit with scant regard for the integrity of the scientific record. That’s not “light” peer review, it’s real peer review.
With this scurrilous parroting of anti-OA FUD, Nature makes pretty clear where its interests and its allies are.  Well, you know what happens when you lie down with pit bulls
There’s a lot more, but that was the issue that pushed my buttons the hardest. See Bora for a roundup of responses; here’s a quick outline of some of the key issues:
Jan Velterop, responding to Butler’s last “investigation” of PLoS finances two years ago, pointed out that it’s ridiculous to expect a new journal with a new business model to break even in a few years, when new journals from established publishers take up to a decade to achieve the same goal; DrugMonkey also mentions the “so what” nature of this complaint. Jonathan Eisen remarks that somehow Butler gets from “PLoS ONE is doing well and making money” to “PLoS is a failure”; go read Jonathan to see how twisted your logic has to be to make that particular trip. (Jonathan also provides an important reminder, that we should not confuse Nature Publishing Group as a whole with their many talented and well intentioned employees!) Grrlscientist observes that, while Butler’s piece makes it sound as though PLoS’ reliance on donations were a bad thing, all journals rely on the donation of time and expertise by unpaid reviewers. Drugmonkey, Jonathan and Grrlscientist all make the point that Nature has its own stable of “second tier” journals with “lower barriers to entry” — the same mechanism for which Butler criticizes PLoS. Stevan Harnad is famous for making the point (here, for example) that if the funds currently draining into subscriptions were used to pay OA costs, there would be an immense improvement in the utility of the scientific record even if there were no financial saving.
Finally, pretty much every commenter has pointed out the glaring lack of any “conflict of interest” statement on the Nature piece — having said which, I’d better make one of my own. It’s well known and obvious at a glance at this blog that my favorite drink is the Open Access Kool-Aid. I have personal friends who work for PLoS, and I’ve previously applied to work there myself.

* originally in lowercase — so much for my snotty (sic)s!

Calling all bioinformaticians…

Mike of Bioinformatics Zen is looking for information; please help him out if you can, by taking the survey either here or at BZ. Take particular note of the following:

The raw data entered into this questionnaire, along with any interpretation will be released into the public domain under a creative commons attribution license. If you are unhappy with answering any of the questions please leave them blank. By completing this questionnaire you consent to your answers being released.

(Yes, I know it’s repeated at the top of the survey: it’s important.)