There’s been a good deal of online chatter about this recent Science article that discusses the effects of online access on scholarship — see, e.g., discussions here and here and blog entries noted therein. The report is not available without paying a toll or subscription, but the abstract is freely visible:
Online journals promise to serve more information to more dispersed audiences and are more efficiently searched and recalled. But because they are used differently than print — scientists and scholars tend to search electronically and follow hyperlinks rather than browse or peruse — electronically available journals may portend an ironic change for science. Using a database of 34 million articles, their citations (1945 to 2005), and online availability (1998 to 2005), I show that as more journal issues came online, the articles referenced tended to be more recent, fewer journals and articles were cited, and more of those citations were to fewer journals and articles. The forced browsing of print archives may have stretched scientists and scholars to anchor findings deeply into past and present scholarship. Searching online is more efficient and following hyperlinks quickly puts researchers in touch with prevailing opinion, but this may accelerate consensus and narrow the range of findings and ideas built upon.
This seems thoroughly counter-intuitive to me, since I find a good deal more information by direct search now that I can do it online, and browsing has never played a significant role in my literature searching. (And remember, I’m old — I started out using Index Medicus!) Who has time to browse probably-irrelevant journals and tables of contents on the offchance that something might be useful? I’m far more likely to stumble across things I’d never have otherwise found when I’m relying on a variety of relevance-based search algorithms (PubMed’s Related Articles, Google Scholar, NextBio, etc.).
For anyone who thinks that “forced browsing of print archives” makes a lick of sense: we’ll pick a topic, then you spend a day or two browsing in meatspace, and I’ll spend an hour searching online. Who do you think is likely to come up with the best (most useful, most comprehensive) set of references?
Moreover, the article’s conclusions seem to be based on a couple of unspoken assumptions with which I don’t agree.
The first is that citing more and older references is somehow better — that bit about “anchor[ing] findings deeply intro past and present scholarship”. I don’t buy it. Anyone who wants to read deeply into the past of a field can follow the citation trail back from more recent references, and there’s no point cluttering up every paper with every single reference back to Aristotle. As you go further back there are more errors, mistaken models, lack of information, technical difficulties overcome in later work, and so on — and that’s how it’s supposed to work. I’m not saying that it’s not worth reading way back in the archives, or that you don’t sometimes find overlooked ideas or observations there, but I am saying that it’s not something you want to spend most of your time doing.
Secondly, let’s take the author at his word:
I show that as more journal issues came online, the articles referenced tended to be more recent, fewer journals and articles were cited, and more of those citations were to fewer journals and articles.
OK, suppose you do show that — it’s only a bad thing if you assume that the authors who are citing fewer and more recent articles are somehow ignorant of the earlier work. They’re not: as I said, later work builds on earlier. Evans makes no attempt to demonstrate that there is a break in the citation trail — that these authors who are citing fewer and more recent articles are in any way missing something relevant. Rather, I’d say they’re simply citing what they need to get their point across, and leaving readers who want to cast a wider net to do that for themselves (which, of course, they can do much more rapidly and thoroughly now that they can do it online).
If that means citing fewer articles now than researchers tended to cite 20 years ago, it probably has more to do with changes in the culture of science than in the electronic availability of research papers. For instance, I think it far more likely — to exaggerate, for the purposes of illustration, in the opposite direction to Evans — that earlier authors, unable to rapidly and comprehensively scan the literature, cited everything they could get their hands on, padding their bibliographies well beyond anything useful in an attempt to lend weight to their arguments.
It’s potentially worrisome if more citations are going to fewer journals, but once again I see no more reason to attribute that to increasing online availability than to attribute it to the sharply rising cost of scientific journals in any form. It’s well documented that as journal prices have continued to rise, researchers and institutions have had to cut back on the number of subscriptions they take. It is not difficult to imagine that “long tail” and “preferential attachment” phenomena (see, for instance, Evans’ own references 14 – 18, reproduced below) would drive the concentration of likely subscriptions towards a pool of “must have” journals. Indeed, publishers actively promote the concept of such a pool and compete strongly to be seen to be part of it.
Finally, and to me most importantly, Evans seems to me to gloss over the question of what proportion of the online archives are freely available, and what effect that has on the phenomenon he is attempting to model. Here’s the crux of what he does say (fair use! fair use!):
I’ve rearranged the figure so that what were left, middle and right panels are now top, center and bottom panels; in all graphs the abscissae are “Years of journal issues online” and the ordinates are “Herfindahl citation concentration”, which is explained as follows:
A concentration of 1 indicates that every citation to [a given] journal [or subfield] in a given year is to a single article; a concentration just less than 1 suggests a high proportion of citations pointing to just a few articles; and a concentration approaching zero implies that citations reach out evenly to a large number of articles.
Here’s Evans’ interpretation of that data:
Figure 2C illustrates the concurrent influence of commercial and free online provision on the concentration of citations to particular articles and journals. The left panel shows that the number of years of commercial availability appears to significantly increase concentration of citations to fewer articles within a journal. If an additional 10 years of journal issues were to go online via any commercial source, the model predicts that its citation concentration would rise from 0.088 to 0.105, an increase of nearly 20%. Free electronic availability had a slight negative effect on the concentration of articles cited within journals, but it had a marginally positive effect on the concentration of articles cited within subfields (middle panel) and appeared to substantially drive up the concentration of citations to central journals within subfields (right panel). Commercial provision had a consistent positive effect on citation concentration in both articles and journals. The collective similarity between commercial and free access for all models discussed suggests that online access — whatever its source — reshapes knowledge discovery and use in the same way.
Wait, what? Let me unpack that with a rewrite from my point of view:
The number of years of commercial availability appears to significantly increase concentration of citations to fewer articles within a journal, whereas free electronic availability had a negative effect on the concentration of articles cited within journals. If an additional 10 years of journal issues were to go online via any commercial source, the model predicts that its citation concentration would rise from 0.088 to 0.105, an increase of nearly 20%. In contrast, if an additional 10 years of journal issues were to go online via any free source, the model predicts that its citation concentration would drop from 0.088 to just under 0.08 [I had to estimate this by eye, since the data are not available], a decrease of around 10%. Similarly, free electronic availability had only a marginally positive effect on the concentration of articles cited within subfields. Only when considering concentration to journals within a subfield did free availability cause a substantial increase, and even then this effect was considerably less than that driven by commercial availability, which had a consistent positive effect on citation concentration in both articles and journals.
In other words, I take issue with the final sentence of the paragraph I quoted: commercial and free access do not show “collective similarity”. On one of three measures they have the opposite effect, and on the other two measures commercial access has by far the stronger effect.
What this suggests to me is that the driving force in Evans’ suggested “narrow[ing of] the range of findings and ideas built upon” is not online access per se but in fact commercial access, with its attendant question of who can afford to read what. Evans’ own data indicate that if the online access in question is free of charge, the apparent narrowing effect is significantly reduced or even reversed. Moreover, the commercially available corpus is and has always been much larger than the freely available body of knowledge (for instance, DOAJ currently lists around 3500 journals, approximately 10-15% of the total number of scholarly journals). This indicates that if all of the online access that went into Evans’ model had been free all along, the anti-narrowing effect of Open Access would be considerably amplified.
In fact, the comparison between print and online access is barely even possible when considering Open Access information. The same considerations of cost — who can afford to read what — apply to commercial print and online publications, but free online information has essentially no print ancestor or equivalent. Few if any scholarly journals were ever free in print, so there’s a huge difference between conversion from commercial print to commercial online on the one hand, and from commercial print to Open Access on the other.
Indeed, I would suggest that if the entire body of scholarly literature were Openly available, so that every researcher could read everything they could find and programmers were free to build search algorithms over a comprehensive database to help the researchers do that finding, then in fact the opposite effect would obtain. Perhaps it’s true that the more commercial online access you have, the less widely a researcher’s literature search net is cast, but as I mentioned above I see no reason to attribute that more to the mode of access than to its cost.
In support of this assertion, consider the expanding body of literature on the Open Access “citation advantage” — studies which show that the likelihood of a given paper being cited is increased up to several hundred percent if the paper is OA rather than commercially available. There is some controversy over that literature, but it stands in direct contrast to the idea that online access of any kind tends to narrow citation reach.
There are more data in Evans’ paper that speak to the free-vs-commercial issue, and some of those data show free access having a stronger “narrowing” effect than commercial access. I’d go through it in detail, but I am probably already pushing the limits of fair use so I’ll have to refer you to the published article — in particular, Figure 2 panels A and B. My response is much the same, that the apparent effect suffers from a loading in “favour” of commercial access, because of the wildly disparate sizes of the two different bodies of online literature.