Peer review was introduced to scholarly publication in 1731 by the Royal Society of Edinburgh, which published a collection of peer-reviewed medical articles. Despite this early start, in many scientific journal publications the editors had the only say on whether an article will be published or not until after World War II. “Science and The Journal of the American Medical Association did not use outside reviewers until after 1940, “(Spier, 2002). The Lancet did not implement peer-review until 1976 (Benos et al., 2006). After the war and into the fifties and sixties, the specialization of articles increased and so did the competition for journal space. Technological advances (photocopying!) made it easier to distribute extra copies of articles to reviewers. Today, peer review is the “golden standard” in evaluation of everything from scholarly publication to grants to tenure decision (in the post I will focus on scholarly publication). It has been “elevated to a ‘principle’ — a unifying principle for a remarkably fragmented ﬁeld” (Biagioli, 2002).
Peer review takes time, effort, and can delay publishing considerably. It’s one of the bottle necks of scholarly publishing. We can distribute very quickly these days, but there are a limited number of experts in every field to review and their time is occupied by other tasks that are part of the academic life. Blaise Cronin, editor of JASIST, estimated that JASIST needs about 1,000 peer reviews a year (one person, of course, can review more than once) for 400 articles and that JASIST approach about 3,000 researchers to find the 1,000 peer reviewers.
Single or double blind?
Traditional peer review is usually single or double blind. Single blind review, where the reviewers are aware of the authors’ identities, but the authors are not aware of the reviewers’ identities, is the most common. Double-blind peer review is when neither the authors nor the reviewers are aware of the others’ identities. Authors tend to believe double-blind reviews are better and less biased, in principle, than single-blind, but they also doubt whether true blinding is possible (See review in Lee et al., 2013). A survey of editors-in-chief, editors and editorial board members of 590 chemistry journals found that 97% of the journals didn’t offer double-blind peer review. Most respondents considered double blinding needless because the content and references could not be truly masked. They thought double blinding would make the detection of fraud harder and considered the system satisfactory as it was (Brown, 2007).
An example: in my new work place we hold an annual e-learning in high education conference. When we receive proposals, one of my jobs is to anonymize them – remove their names and make sure that wherever the authors wrote their institute’s name it will be replaced with simply “the institute”. Unfortunately, the Israeli e-learning community is so small the process is pretty useless, because everyone knows everyone and everyone knows what goes on in which institute. A study by Justice et al. (1998) reached similar conclusions. They masked the authors’ identity in manuscripts sent to five prominent medical journals before sending them to be reviewed, but about 30% of the reviewers were able to identify the authors regardless (perhaps because self-references in the text weren’t removed). In small research fields this number can go even higher.
Peer review, fraud, and mistakes
Whenever cases of fraud in published article or significant flaws are revealed, it seems natural to wonder “where were the peer reviewers?” Researcher Jan Hendrick Schon, for example, published over a hundred articles in four years (1998-2002) and the reviewers failed to detect 16 cases of misconduct in his articles. Sometimes, it’s simply a matter of chance. The blog Retraction Watch reported lately that after a colleague pointed out an (honest) error in their article, a group of authors retracted a paper from Physical Review Letters. Had this colleague been their pre-publication peer reviewer, the article wouldn’t have been published to begin with.
In a study, eight weaknesses were entered into an article already accepted for publication and sent to JAMA reviewers (200 respondents) who, on average, found less than two per reviewer. Sixteen percent didn’t find any weakness and only 10% found more than four (Godlee et al., 1998). Callaham et. al. (1998) sent a fake manuscript, containing 23 deliberate flaws, to editors of Annals of Emergency Medicine and all the journal’s peer reviewers who reviewed at least three manuscripts before the study. On average, the reviewers detected 3.4 of the ten major flaws in the manuscript and 3.1 of the 13 minor ones. Peer reviewers are the ‘gate-keepers’ of science, but their gate-keeping is far from perfect.
Inter-reliability of peer review
Peer review tends to have low levels of inter-rater reliability between reviewers (0.2-0.4). This, at least from a statistical point of view, makes them pretty unreliable. However, this might not be a bad thing: “Too much agreement is in fact a sign that the review process is not working well, that reviewers are not properly selected for diversity, and that some are redundant” (Bailar, 1991). It could be the reviewers give different weight to different qualities of the reviewed article, or that the article’s subject has not reached a scientific consensus (e.g.: altmetrics, useful tools for evaluation or rubbish?).
I’m well aware this post covers only a small part of the discussions and arguments about peer review (note the post is called “Introduction to traditional peer review), and hope to discuss the topic again in the future.
Bailar, J. (2011). Reliability, fairness, objectivity and other inappropriate goals in peer review Behavioral and Brain Sciences, 14 (01), 137-138 DOI: 10.1017/S0140525X00065705
Biagioli, M. (2002). From Book Censorship to Academic Peer Review Emergences: Journal for the Study of Media & Composite Cultures, 12 (1), 11-45 DOI: 10.1080/1045722022000003435
Benos DJ, Bashari E, Chaves JM, Gaggar A, Kapoor N, LaFrance M, Mans R, Mayhew D, McGowan S, Polter A, Qadri Y, Sarfare S, Schultz K, Splittgerber R, Stephenson J, Tower C, Walton RG, & Zotov A (2007). The ups and downs of peer review. Advances in physiology education, 31 (2), 145-52 PMID: 17562902
Bornman, L. (2008). Scientiﬁc Peer Review: An Analysis of the Peer
Review Process from the Perspective of Sociology
of Science Theories Human Architecture: Journal of the Sociology of Self-Knowledge, 6 (2)
Brown, R. (2006). Double Anonymity and the Peer Review Process The Scientific World JOURNAL, 6, 1274-1277 DOI: 10.1100/tsw.2006.228
Callaham ML, Baxt WG, Waeckerle JF, & Wears RL (1998). Reliability of editors' subjective quality ratings of peer reviews of manuscripts. JAMA : the journal of the American Medical Association, 280 (3), 229-31 PMID: 9676664
Godlee, F., Gale, C., & Martyn, C. (1998). Effect on the Quality of Peer Review of Blinding Reviewers and Asking Them to Sign Their Reports JAMA, 280 (3) DOI: 10.1001/jama.280.3.237
Lee, C. J.,, Sugimoto, C. R.,, Zhang, G.,, & Cronin, B. (2013). Bias in Peer Review JASIST, 64 (1), 2-17
Spier R (2002). The history of the peer-review process. Trends in biotechnology, 20 (8), 357-8 PMID: 12127284