Scientists give you all their data in the hopes that you will outsmart them

So you want to be a scientist? Here is your chance. We are going to release one of the largest datasets on the microbes of our skin ever collected. We are going to release it right now, before we publish it. We are going to release it so you—the person out there, wherever “there” is—can come up with new hypotheses and even analyses. We are going to release it because we think that you are, collectively, much smarter than we are.

Image 1. Time to get your inner Rodin on. Hey, is he looking at his belly button? (Photo: Wikimedia Commons)

What we have on offer here is a new approach, one that might fail but is worth trying. When we learn about science in school, we typically learn the “standard method” of doing science. In the standard method, there are four steps…

STEP 1- Scientists begin by developing hypotheses, or ideas that can be tested. Hypotheses, history teaches us, come from conversations with friends, published papers, existential crises, bolts of mental lightening, or fishing trips.

STEP 2—Experiments or observations are then designed to test those hypotheses. Experiments should be designed to perfectly test the hypotheses with every possible contingency in mind. Cost is no obstacle.

STEP 3—Once the project is designed, the scientist gathers data. Data must be gathered even if the scientist becomes bored. They must be gathered even once the scientist becomes interested in something totally different. They must be gathered and gathered until the scientist becomes ornery, disillusioned or forlorn.

STEP 4—Once data have been collected, scientists must refrain from guessing what the data show and wait until analyses are performed. Statistical analyses are used to reject each bad and inelegant hypothesis until one is left with something approaching, as close as is possible, the truth. But the truth is never a complete truth. So upon encountering a truthy thing, one comes up with more hypotheses and repeats the whole process until retirement or until everything in the universe is understood, whichever comes first.

This is, indeed, science. The textbooks reminding students of these steps are right, but they are only partially right. Science can also be done many other ways. No two Nobel Prize winners, for example, do science quite the same way, nor does anyone else. Even in my own relatively small biology department scientists do science in many different ways (which is one of the reasons we sometimes disagree when hiring new colleagues about which candidate is great and which is terrible). The same has been true in every department I have ever worked in or visited. There are many scientific methods and many successful exceptions to the rules.

One of the exceptions I have long been interested in is just who really performs each step in this (or any) scientific process. Essentially since the origins of science scientists have convinced other people to do their data collection for them.

When the people gathering the data are unpaid and taking classes we call them students. When they are poorly paid and not taking classes, we call them technicians. When they are totally unpaid and also have day jobs, we call them citizen scientists or volunteers. The bigger a question gets (or the busier a scientist gets with non-science responsibilities) the more important these data gatherers become.

Of course, in each of these cases there is an upside. For students, gathering data is the gateway drug to designing their own projects and developing their own hypotheses. For citizens, gathering data allows them to be part of the bigger endeavor of science and, as individuals in something larger, they often don’t get the full tedium of collecting all the data. As for the technicians, well, bless them.

Interestingly, while there is a long history of scientists enlisting other people in data collection, there is a far smaller history of public participation in the other steps of science. These other steps include the parts of science that I think are, almost inarguably, the most fun: coming up with hypotheses and analyzing which hypotheses can be rejected. Interestingly, these are two steps in the scientific process where an individual scientist seems to be most likely to need help. This is certainly the case in our recent study of skin bacteria.

As I mentioned a few days ago, we have been struggling to explain differences among individual humans in terms of the bacteria found on their skin. Everyone’s skin is covered in bacteria (the absence of skin bacteria is never an option, you are cloaked in life, whether you like it or not) the question is just which ones we have and why. We have come up with a number of hypotheses that might explain the number of bacteria and the kinds of bacteria found on any particular person. We thought gender might be important or age or whether a person’s belly button is an innie or an outie.

Using our initial dataset of 60 people, we tested a number of these hypotheses, but so far none of these hypotheses seem supported by the data (one might say they are all rejected, except that with just sixty people, a rejected hypothesis might just be one for which we have too few data to test). Now we have a larger dataset. This dataset includes over 150 belly buttons from around the world.

What we would like to do now is to enlist your (unpaid) help in generating hypotheses. This has, amazingly, already begun in earnest. I am being emailed about ten new hypotheses a day and I love them! Please add your ideas to the list. But we would also like to enlist your help in analyzing the data and visualizing just what is going on. Lend us your late night idea, but also your analytical and artistic skills.

In fact, we’ve created a new space on our website ( for you to share your ideas and show-off your talents. There you’ll also find a downloadable data file. It includes data associated with each person in the study (except their names and other identifiers) along with the data on the species (technically OTUs, or operational taxonomic units, based just on the genes of the critters we find) found in their belly buttons.

We aren’t sure what will happen when we open these data up for public analysis. In the traditional approach to science, it is not clear if we can (or how we would) publish the best analyses of these data in a scientific journal. The journals are not built in such a way as to allow what we are doing here. But maybe that doesn’t matter. After all, we have saved ourselves some fun parts of science too. You see, in the next months we will have data from an additional four hundred people from all over North America. When those data are in, we can follow up on whatever hypotheses and analyses emerge here. We can, in other words, start the scientific process anew building upon whatever you figure out.

With that we leave you to it -- the flash of ideas, the art of visualization and the clarity of fine analyses. So often citizen scientists are given the dregs to work with, the hard samples to sort. We are giving you the good stuff so that you can participate in the fun too. The only question is whether you will, well that and what the heck is going on with the mysteries in belly buttons.