Skip to main content

Can a Google algorithm identify the best scientific research?

This article was published in Scientific American’s former blog network and reflects the views of the author, not necessarily those of Scientific American



On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.


How can one quantify the importance of a given scientific paper? One simple and frequently utilized measure is the number of times that paper is cited in subsequent publications. But critics note that counting citations favors disciplines such as biology, where papers tend to be cited more, over fields such as mathematics, where citations are less frequent. In addition, a citation from a relatively marginal paper counts just the same as a citation from a leading researcher publishing in a marquee journal.

In a study published in October in the Journal of Neuroscience and recently made available at the online repository arxiv.org, physicists Sergei Maslov of Brookhaven National Laboratory and Sidney Redner of Boston University examine the value of Google's PageRank algorithm as it applies to ranking scientific works. (The researchers rightly point out that no quantitative system can truly "value" a scientific work—but since many such metrics are already in use, it stands to reason that they should be improved.)

Instead of hyperlinks, their version of PageRank takes journal citations as the fundamental link of a hypothetical network. By traversing this network randomly from node to node, PageRank gives a higher rank to papers that are better connected—that is, papers cited by papers that are, in turn, frequently cited. It also accounts for the varying citation habits of different disciplines—a paper that cites a small number of predecessors confers more value on its citations in the PageRank algorithm than does a paper that cites dozens of past works.

The physicists make one modification to the Google algorithm: boosting the "boredom factor" at which a hypothetical network surfer is presumed to give up: whereas a Web user might follow a chain of six hyperlinks before moving on, a researcher digging in the scientific literature might go back only two steps.

Henry Small, chief scientist at the scientific business of Thomson Reuters, a major provider of data on peer-reviewed publishing, says that the PageRank approach is one of many that have emerged recently as increasingly more information becomes available online.

"The field has literally exploded in the last five years or so," he says. And although standard approaches such as citation counts may be less-than-perfect measures of scientific influence, he notes, "there's a trade-off for all of these."

Applied to the physics journals of the American Physical Society, including the prestigious Physical Review Letters, PageRank turns up a veritable who's who of physics luminaries, including a suite of Nobelists. The problem, as Small points out, is that PageRank has a certain "time-lag effect." While the algorithm is "good for identifying classic papers," he says, it can take years for an important paper to develop a broad enough network of citation links to leap into the upper PageRank echelon.

Maslov and Redner correct for this effect by introducing an exponential preference for more recent papers—interestingly, this factor seems to align the results more closely with traditional citation counts. You can test their adjusted algorithm, dubbed CiteRank, on Maslov's Web page.

Photo ©iStockphoto.com/Felix Möckel