May 22, 2012

Discussion of scholarly information in research blogs

This article was published in Scientific American’s former blog network and reflects the views of the author, not necessarily those of Scientific American

As some of you know, Mike Thelwall, Judit Bar-Ilan (both are my dissertation advisors) and myself published an article called "Research Blogs and the Discussion of Scholarly Information" in PLoS One. Many people showed interest in the article, and I thought I'd write a "director's commentary" post. Naturally, I'm saving all your tweets and blog posts for later research.

The Sample

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

We characterized 126 blogs with 135 authors from Researchblogging.Org (RB), an aggregator of blog posts dealing with peer-review research. Two over-achievers had two blogs each, and 11 blogs had two authors.

While our interest in research blogs started before we ever heard of RB, it was reading an article using RB that really kick-started the project. Groth & Gurney (2010) wrote an article titled "Studying scientific discourse on the Web using bibliometrics: A chemistry blogging case study." The article made for a fascinating read, because it applied bibliometric methods to blogs. Just like it says in the title, Groth & Gurney took the references from 295 blog posts about Chemistry and analyzed them the way one would analyze citations from peer-reviewed articles. They managed that because they used RB, which aggregates only posts by bloggers who take the time to formally cite their sources. Major drooling ensued at that point. People citing in a scholarly manner out of their free will? It's Christmas!

We didn't choose RB because it was the best representation of science blogs in general; we chose it because of the structural citations the bloggers used. The blog posts aggregated by RB made for excellent citation-meme carriers. As Prof. Bar-Ilan says it, RB posts are the "transition phase" between the citation in formal communication and the free-form writing of blogs. It's a bibliometrical Archaeopteryx. On hindsight, we should have emphasized our interest in the blogations more (which is why this post is called "Discussion of scholarly information in research blogs" rather than the article's actual name).

Archaeopteryx (Model of Archaeopteryx lithographica in the Oxford University Museum of Natural History. Photograph taken by Michael Reeve, 30 May 2004.)

RB also had the advantage of human editors, who decide which of the applying blogs are aggregated, so we were spared weeding the pseudo-science and spam blogs out of our data. We narrowed the sample further by focusing on blogs that had at least 20 posts in RB until January 2011, to make sure we have enough blogations "fodder" for research, and that have only one or two authors. The last rule was placed to ensure the blogs had at least basic similarities, so we wouldn't be comparing apples and oranges.

However, choosing RB also means we chose only the bloggers who use it, a self-selecting population. There could be – there are- many bloggers that use structural citations, but either don't bother with RB or aren't even aware of its existence. RB has its own biases, which are very visible when one looks at its tagging system: Biology has 28 subtags and Psychology has 21, while whole disciplines like History and Sociology are only subtags under "Social Sciences".

Most blog-cited journals

Groth and Gurney have already found that Chemistry blog posts cite research from high-impact journals, but we had to make sure the same goes for other disciplines. We extracted blogations from the last five posts of each of the 126 blogs in the sample, and you can see the results here:

Science bloggers, like most scientists, cite journals with high impact factors. Science, Nature and PNAS are the highest-impact journals in the Journal Citation Reports (JCR) "Multidisciplinary" category. We suggest several explanations:

Bloggers cite what they know – almost 60% of the bloggers were affiliated with a research institute at the time of the study. Twenty-seven percent were graduate students, 32% had a PhD. Given these journals' prestige in the academic world, it's expected that bloggers read and cite them often. Just because we're in the realm of blogging doesn't mean the bloggers weren't influenced by academic norms (given their use of citations, we're pretty sure they were influenced by some…)

Another explanation is the media's preference for high-impact journals (prestige peer-reviewed journals have "authority" which the media love). If bloggers want to comment about the coverage of articles in the media, they have to cite the same articles the media use.

Distribution of bloggers’ education levels

However, PLoS One, which is the forth in the most-cited list, is only the 12^th out of 86 journals in the JCR's "Biology" category. It's still in the first quartile of the Biology category, but its ranking and impact factor aren't nearly as high as those of Nature, Science and PNAS. We can assume that PLoS One's popularity with research bloggers is because it's one of the best-known open access journals, and it mostly publishes biology research (RB has a Life Science bias). Also, PLoS One is a huge journal (4,403 items published in 2009, in comparison with 866 and 897 in Nature and Science, respectively), so the chances of running into PLoS One blogations were statistically higher. Our results look very much like the Mendeley ranking (a popular academic bookmarking site).

Who moved my data?

Data is always a tricky part in a Web research. When dealing with articles, citations from peer-reviewed journals and so forth, the data is more-or-less stable. It's true that articles get retracted and journals change their names, but they do so in a manageable phase. My data, on the other hand, keep running away from me. Geologists Anne Jefferson and Chris Rowan's joint Twitter list for Highly Allochthonous went extinct, and all that's left are the two bloggers' individual accounts. Ed Yong now has 19,598 Twitter followers instead of the mere 11,638 he had when the paper was written. By the time I get this post up, he'll undoubtedly have a few more. Krystal D'costa moved her blog to the SciAm network. Even my blog, which is part of the sample, moved! The ever-changing nature of the Web gives information scientists a lot of grief, as well as makes it very hard to replicate results.

You may have noticed I didn't discuss the gender issue here (only 22% of the blogs had a female author). I'm saving it to a future post about women in science blogging.

References

Shema H, Bar-Ilan J, & Thelwall M (2012). Research Blogs and the Discussion of Scholarly Information. PloS one, 7 (5) PMID: 22606239

Groth, P., & Gurney, T. (2010). Studying Scientific Discourse on the Web Using

Bibliometrics: A Chemistry Blogging Case Study. Proceedings of the WEbSci10: Extending the Frontiers of Society On-Line.