I've recently been working on a new project with Ellie Harmon about dirt. Ellie hiked the Pacific Crest Trail last year, the 2,663 miles from the US border with Mexico to the border of Canada. She collected dirt throughout California and sent them to me in the lab, a total of 62 samples that represent a wide diversity of California's landscape.
Over the past few months, we transformed the dirt into data. First, we isolated DNA from each sample using the MoBio PowerSoil DNA Isolation Kit. In each step of the procedure, more and more of the non-DNA material in the dirt is stripped away, until at the end the diversity of colors and textures in the dirt are transformed into tiny volumes of clear liquid.
We sent the DNA to a company that could amplify and sequence all the 16S RNA genes in the sample. 16S RNA is a part of the bacterial ribosome, a sequence shared by all bacteria that is commonly used to identify closely related species. The company sent us back a huge amount of raw sequence data and a summary table of the relative abundance of each species in each sample.
These diagrams give a good general picture of the diversity and distribution of bacteria across the different samples, but they hide a lot of the work that goes into processing and interpreting the sequence data. The concept of "species" itself poses some deep philosophical questions when it comes to microbes. If we define species boundaries as separating groups of organisms that can't interbreed, can asexual organisms be thought of as a species? With so much horizontal gene transfer between different individual bacteria, how stable is species identification? Is 16S sequence a good proxy for species differences? Often when looking only at 16S sequences, researchers will refer to "operational taxonomic units" (OTU), clusters of similar sequences identified algorithmically rather than "species," which brings along so much more philosophical baggage.
The heatmaps below show some of the differences between our data when represented as OTUs vs. species. The top part represents all of the OTUs for one of our samples, around 13,000 different sequence clusters, each represented by a square. Representative sequences from each OTU can then be matched to the database of 16S genes, providing the final list of species that we used to make the circle diagrams, about 1200 species represented in the bottom diagram. OTUs that don't match the database are colored orange; in the first row you can see that they are spread throughout the data but then condense into a single square in the species data. Overall, the number of defined "species" is more than 10X fewer than the number of OTUs because of these "no hits" and because often several different OTUs will match the same species.
Together, all this data offers a snapshot of the microbial biogeography across California, but more importantly shows the process of transforming the incredibly diverse and heterogeneous populations of microbes in the dirt into datasets that can be visualized and compared. Last week, Ellie and I put together the dirt, the data, and photographs of the process into an exhibition at UCLA's Art|Science gallery. For us, presenting this pretty straightforward scientific experiment as artwork let us ask different questions about laboratory methods, the aesthetics of data visualization, and about how we make scientific knowledge. Taking this scientific process out of the laboratory context and into the gallery didn't take away from the results, but gave us a richer context for understanding what those results might or might not be able tell us about nature, science, and the dirt underneath our feet.
*This project was supported by the University of California Institute for Research in the Arts. Special thanks to Ann Hirsch, Kavita Philip, Victoria Vesna, Nick Seaver, Luke Olbrish, Beth Reddy, Maskit Maymon-Schiller, Mick Lorusso, Marissa Clifford, Dawn Faelnar, Otherworld, Mike Bostock, Research and Testing Laboratory, and the US Postal Service.