Skip to main content

When should a scientist's data be liberated for all to see?

This article was published in Scientific American’s former blog network and reflects the views of the author, not necessarily those of Scientific American


When researchers make an exciting discovery, the data behind it are often closely guarded until they can be examined, developed and then revealed—at least in part—in a peer-reviewed journal with all of the proverbial fanfare.

But that custom often leaves the public and most of the research world in the dark—sometimes for years, as some lamented in the case of the formal description of the hominid Ardipithecus ramidus, which came some 15 years after the original discovery. Publication usually involves sharing some data because the scientific method encourages others to review one's work so they can attempt to replicate it. But in a Web-driven era of rapidly moving and easily stored data, however, many researchers now argue forcefully for an open exchange of data and the wider use of so-called scientific commons.


On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.


Climate change, molecular chemistry and microbiology are just a few of the fields currently entertaining the idea of a better-connected repository to which data can (or must) be uploaded soon after discovery. And in the medical world, many researchers are looking hopefully toward a digital future in which masses of patient data can be examined for patterns of disease soon after they are gathered.

"It would be preferable, from a pure scientific advancement standpoint, to have every piece of data released immediately to the public," Jorge Contreras, deputy director of the Intellectual Property Program at Washington University's School of Law in St. Louis, Mo. and author of a new policy essay on the topic published online July 22 in Science, said in a prepared statement.

That idealistic approach, however, "doesn't give data-generating scientists the opportunity to publish and advance their careers through publication," he noted. Thus new findings and data sets are still usually held close to the vest in the harsh publish-or-perish world.

And the data dearth doesn't necessarily stop with publication. "Because of busy schedules, competitive pressures and other interpersonal vagaries, the sharing of scientific data can be inconsistent even after publication," Contreras observes in his essay.

Not every field has been so tight-fisted with its data. As an encouraging example, he points to the Human Genome Project's stipulation that all new data be made public within 24 hours of being generated. But, he concedes, not every discipline is primed to fall in line with such immediate free access. The genome "represented the common heritage of the human species and should not be encumbered by patents," he writes. But patents are precisely the point of many scientific endeavors, and showing your cards to the competition early on is a patently dim decision.

Thus Contreras proposes a balance of data access and data rights. "I think you must have a compromise," he said in a prepared statement. "Commons weighted too heavily in favor of data users are not likely to attract sufficient contributions from data generators, whereas commons weighted too heavily in favor of data generators" would be less helpful to other scientists and the public. 

But that doesn't mean data should be held back. Instead, he argues, widely accepted lead times—after data are publicly released but before others can publish results on them—would allow "data generators a 'head start' on preparing publications based on their data, yet data are still broadly available for the general advancement of science."

Image courtesy of iStockphoto/AlexRaths