April 16, 2020

AI Will Help Scientists Ask More Powerful Questions

Self-learning systems can discover hidden patterns in immense data sets, transcending what humans could ever find on their own

By Pushmeet Kohli

This article was published in Scientific American’s former blog network and reflects the views of the author, not necessarily those of Scientific American

Scientific advances over the last several centuries have not only resulted in a greater understanding of the universe; they’ve raised the standard of living for many people across the globe. However, there are still massive challenges we’re ill equipped to meet, as evidenced by climate change and the COVID-19 pandemic, which have shown that we are yet to understand the complexity of nature. In order to address the scale of problems now facing humanity, radical solutions are needed—and scientific breakthroughs will be central to this process. Artificial intelligence promises to accelerate fundamental discoveries by deepening the nature of questions researchers can ask.

In his visionary essay “As We May Think,” published in 1945, the prominent American engineer and science advocate Vannevar Bush predicted that people would soon need to rely on external devices to augment their minds. Even then, he could see that the rate of scientific discovery was so great that the need to store, process and understand information already exceeded people’s biological capacity.

His prescient observation rings truer than ever: one of the challenges of modern science is to make sense of the vast amount of information we’ve gathered about the world. Given the scale of data generated by science—from the Large Hadron Collider to massive genome projects—it’s impossible for any individual person to parse it all. AI stands to help us turn this abundance of information into understanding—enabling us to ask questions that would be intractable for individuals to solve.

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

Scientists have long used computing to advance science, employing computer programs to model and simulate natural systems to explain and understand scientific phenomena. This approach has been incredibly fruitful for science, and fueled advances ranging from simulations of atoms to models of the universe. However, this classical paradigm is limited by its reliance on human programmers who must first distill rules from theories and observations, then use these rules to code a program’s behaviors. Our hope is to use AI systems to deduce such rules directly from data or experience, and potentially go beyond what individual researchers might decipher. These self-learning systems can explore potential solutions and strategies by discovering hidden properties of the underlying structure of immense datasets, and may therefore augment, rather than be limited to, human understanding.

A crucial point, then, is finding the right problems for these systems to explore. Though a great deal of work is being done applying AI to the sciences, it’s not necessary that a direct application of these technologies would (or should) result in a breakthrough in every scientific problem. The most impactful advances will come from applying AI techniques to questions that really matter to society, and for which sophisticated reasoning and analysis abilities are required. Much of the art of solving a problem lies in picking the right question in the first place.A crucial point, then, is finding the right problems for these systems to explore.

For example, one of the most important open questions in biology is understanding how proteins take their shapes. Proteins are essential to the body’s healthy operation, and act like miniature machines within cells to carry out the many tasks of living. A protein’s shape dictates its function, which is why so many research groups are dedicated to discovering the structure of different proteins; once a protein’s shape is known, researchers can better understand how it works, and screen for drugs that interact with it when it malfunctions in diseases. It so happens that this is a perfect application for AI, because we have relatively large data sets of known protein structures to train systems on, and this is a problem for which we can quantify progress.

Scientists might spend years working out the shape of a single protein using time-consuming experimental methods like crystallography. Instead of working out the shape of one protein at a time, what if we could use existing data to teach an AI system how to predict the shape of any naturally occurring or even theoretically possible protein just from its amino-acid sequence description. Based on learning techniques inspired by neuroscience, our recently published AlphaFold model can train on large data sets of known protein structures to predict how a one-dimensional string of amino acids folds into a three-dimensional shape.

Using this system, we recently generated predictions for the shapes of six proteins comprising SARS-CoV-2, the virus that causes COVID-19. While the structures predicted by our method don’t directly lead to a cure, they may provide useful hints to researchers working on drugs and antibodies that could work against the virus, and may add to our understanding of this global health threat.

Many academic groups have been making steady progress on the folding problem for years, as evidenced by accuracy improvements in CASP, a biannual protein folding prediction competition. In 2018, AlphaFold took top honors at CASP13, representing a 40 percent improvement in accuracy over the previous competition’s best model. In the future, this approach could help scientists focus on the most promising leads, saving time and money, for example, in the notoriously expensive drug development process. Through AI-driven simulations, it may be possible to design novel proteins in silico, then test them in the real world—helping researchers direct research efforts and funding more efficiently.

This is the beauty of AI: it will enable abstraction from the particular to the general, distilling unifying principles from experience. It deepens the nature of questions scientists can ask: not simply “what is the shape of protein X?” but, more fundamentally, “what dictates the shape of any protein?” Going after a question like this doesn’t contribute one answer, but many, opening up entire new fields of inquiry.

If we can make sufficient progress on predicting how proteins take their shapes, we might make it easier to design new drugs, enzymes and universal vaccines, leading to countless social benefits. Similarly, if we can use AI to faithfully simulate collections of atoms, it might be possible to rationally design new materials for batteries, solar energy technology, carbon capture and more. Given the right question, the right training data and the ability to quantify learning, AI systems stand to deepen our scientific understanding and accelerate new technological breakthroughs. AI is much more than automating image classification or streamlining supply chains; we want to use it to discover new knowledge about the universe, and use that understanding to better the world.