September 10, 2013

Opening a can of data-sharing worms

Are researchers’ dogs eating a lot of their homework? Well, yesterday afternoon at the quadrennial medical editors’ scientific meeting in Chicago, we found out they kinda are.

By Hilda Bastian

This article was published in Scientific American’s former blog network and reflects the views of the author, not necessarily those of Scientific American

Are researchers' dogs eating a lot of their homework? Well, yesterday afternoon at the quadrennial medical editors' scientific meeting in Chicago, we found out they kinda are.

Timothy Vines and colleagues did a study on how the reproducibility of data sets in zoology changes through time. They gathered 516 papers published between 1991 and 2011. And then they tried to track the data down.

Even tracking down the authors was a challenge, never mind the actual data. As the years went by, a dwindling minority of papers were accompanied by author email addresses that still functioned.

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

Vines' luck with data was even worse. In the end, only 37% of the data even from papers in 2011 were still findable and retrievable. But the proportion dropped each earlier year. By the time they got to papers published in 1991, only 7% of the data could be determined to truly still be in existence and retrievable. By then, few authors could be found, and most of them were reporting that their data were lost or inaccessible.

Researchers who had the data had died, retired, or the research had been done five computers and two universities ago. Or the data were in software or hardware that no one could access any more. As the stories and reasons kept coming, we were all wincing and more or less freaked - partly in personal recognition of life as we all know it, and partly at seeing the collective enormity of this problem tabulated. Human research in areas requiring that data be kept might fair better, but who knows? Vines thinks years from now people will look back and think it was silly not to publish data at the same time as the article.

The following speaker added further cheery news: Christine Laine from the Annals of Internal Medicine told us that between 2008 and 2012, researchers' willingness to share their data had actually decreased. So although they have the admirable practice of including a reproducibility statement, researchers who want to replicate will still often have trouble getting the precise details they need.

Some people definitely don't want to share. Kay Dickersin told one such story, and the way one drug company approached unwelcome data about one of their products. You can read about it here. Although fair warning: you mightn't want to read it before bedtime - it's quite scary.

Dickersin pointed out that the problem of hidden trial data is particularly bad for industry studies of off-label use of drugs. No trials have to be submitted to the FDA or other drug regulatory authorities for those uses, cutting off one of the major sources of data.

Dickersin was delivering a tour de force annual EQUATOR lecture at the end of the day yesterday. A key message came early: "We must agree on the balance between scientific trust and scientific accountability," she said. "It's not just that the studies aren't reported - the investigators aren't telling the whole story."

The problem goes through the whole health and science ecosystem, Dickersin pointed out. Whether it's academic researchers, industry or clinicians, fears about legal implications drive all sorts of behavior, including withholding data: "We've been too industry-focused: academics are resisting this too." People's unpreparedness for high quality, sustainable data-sharing practices needs to be taken seriously - which means working to resolve the content, ethical, and practical problems standing in the way.

Dickersin was concerned that researchers have to "become detectives" to find out treatment effects and resolve discrepancies between different sources of data: "There needs to be more scrutiny by regulators."

Two particular examples of good practice in sharing clinical trial information were highlighted: the extensive processes, including detailed data dictionaries to explain the content and technical detail for the data, advanced by the NIH's NICHD (Eunice Kennedy Shriver National Institute for Child Health and Human Development). And YODA: Yale University's Open Data Access Project that got a lot of attention in June with the publication of its project on Medtronic's biological agent to promote bone growth.

There was a lot of passion in the room on this: discussion went well over time. Safe and valuable data-sharing and preservation are critical. But as we've seen with other areas like genomics dealing with the same issues, it's going to take a lot of effort. And a lot of parts of the system are involved. As Dickersin said last night: "The whole system is depending on the rest of the system to work."

~~~~

From day 1: "Bad research rising"

From the morning of day 2: "Academic spin"

As you would expect from a congress on biomedical publication, there’s a whole lot of tweeting going on. Follow on #PRC7

The cartoon is by the author, under a Creative Commons, non-commercial, share-alike license. Photo of Kay Dickersin by the author.

The thoughts Hilda Bastian expresses here are personal, and do not necessarily reflect the views of the National Institutes of Health or the U.S. Department of Health and Human Services.