Studying the human genome requires sequencing billions of base pairs. Tracking an epidemic involves elaborate computer simulations with multiple variables that influence how it spreads. One thing this research has in common? None of it would be possible without powerful computers. Today, biology often involves massive amounts of data that can only be analyzed by supercomputers. This type of big-data research has become so common that in 2015, a team of researchers posed a question: Is there any area of biology that doesn’t involve computation? Increasingly, it seems that the answer is a resounding no. Biologists are not only working in the field and at the lab bench; they are also using computers as tools for studying the living world.
The merger of biology and computer science has shifted the way that we understand our own place in the world as modern humans. Just consider how, in 2016, computer models were used to explore genetic information extracted from ancient bones to probe the extent of interbreeding between modern humans, Neanderthals and Denisovans. The computer, alongside genetic and anatomical information, helped researchers discover new insights on human evolution.
Nearly 20 years ago, researchers forecast this change and advocated for additional computer resources. sometimes called “cyberinfrastructure,” for biologists. Funding agencies began offering grants to help develop a new set of tools for the changing face of biology. Data sharing, data storage and high-performance computing were developed to facilitate work with large data sets. However, with the abundance of tools came an important question: do people in the field actually know how to use this technology?
This question was at the core of a survey my team conducted of biology researchers in the U.S. We wanted to know what today’s top researchers were doing with their data, how they were handling it, and if they felt prepared to do the kinds of computational tasks that have become so common in virtually all areas of biology.
The results of our survey, published in 2017, were astonishing. Of the 704 researchers who responded, nearly 90 percent were working with large data sets or would be soon. They worked with all kinds of data to answer questions in the life sciences, including DNA & RNA sequence data, images, phenotype information and even data collected from microscopic examination of samples. More than three-quarters (77 percent) reported working with more than one kind of data in their research.
It was evident that what biologists really needed now wasn’t more tools. Rather, they needed help developing their expertise in using available resources. In short, what they most wanted was training: training on how to use multiple types of data to draw conclusions, training on how to wrangle terabytes of data, and training on how to best use the cloud for high-performance computing.
Even as researchers advocate for more computational training, the good news is that many have already begun to integrate their existing skill set into undergraduate education. Taking cues from a 2009 report from the National Science Foundation and American Association for the Advancement of Science, educators are treating biology as the truly interdisciplinary science it is, drawing on computer science to improve the biology curriculum.
To continue advancing biological research, we need to provide opportunities to hone computational skills every level, from high school onward. Including work with big data in the classroom, however, is just a start. Professional development opportunities both in-person and online need to be widely available to help develop and maintain the necessary skills for biological research for mid- and late-career researchers, especially as computational tools such as artificial intelligence continue to evolve.
My research clearly shows that today’s biologists, as a group, want more training to improve their skills as computer-focused 21st century researchers. The good news is that new approaches to training are continually being developed and improved upon, including community-driven training and “hack weeks”, to help scientists get better at doing their data science.
Not only will further training make possible an infinite amount of new research, but there’s a bonus that shouldn’t be overlooked. If today’s biologists are properly trained in aspects of computational biology, they will be able to effectively prepare the next generations of researchers who will likely be even more dependent on computers and cyberinfrastructure for their work.