Cracking Open the Black Box of AI with Cell Biology
A deep neural network that’s mapped to the innards of a yeast cell reveals its inner workings
The deep neural networks that power today’s artificial intelligence systems work in mysterious ways.
They’re black boxes: A question goes in (“Is this a photo of a cat?” “What’s the best next move in this game of Go?” “Should this self-driving car accelerate at this yellow light?”), and an answer comes out the other side. We may not know exactly how a black box AI system works, but we know that it does work.
But a new study that mapped a neural network to the components within a simple yeast cell allowed researchers to watch the AI system at work. And it gave them insights into cell biology in the process. The resulting tech could help in the quest for new cancer drugs and personalized treatments.
First, let’s cover the basics of the neural networks used in today’s machine learning systems.
Computer scientists provide the framework for a neural network by setting up layers, each of which contains thousands of “neurons” that perform tiny computational tasks. The trainers feed in a dataset (millions of cat and dog photos, millions of Go moves, millions of driver actions and outcomes), and the system connects the neurons in the layers to make structured sequences of computations. The system runs the data through the neural network, then checks to see how well it performed its task (how accurately it distinguished cats from dogs, etc). Finally it rearranges the connection patterns between the neurons and runs through the dataset again, checking to see if the new patterns produce a better result. When the neural network is able to perform its task with great accuracy, its trainers consider it a success.
These days, black box AI systems are accomplishing remarkable things. They are, just for starters, sorting cat photos for the Internet, beating grandmasters at the ancient game of Go, and sending self-driving cars speeding down highways.
Although they’re called neural networks, these systems are only very roughly inspired by human neural systems, explains Trey Ideker, a professor of bioengineering and medicine at UC San Diego.
“Look at AlphaGo [the program that beat the Go grandmaster]. The inner workings of the system are a complete jumble; it looks nothing like the human brain,” Ideker says. “They’ve evolved a completely new thing that just happens to make good predictions.”
Ideker, who led the new research on the AI for cell biology, set out to do something different. He wanted to use a neural network not just to spit out answers, but to show researchers how it reached those conclusions. And by mapping a neural network to the components of a yeast cell, his team could learn about the way life works. “We’re interested in a particular structure that was optimized not by computer scientists, but by evolution,” he tells IEEE Spectrum.
This project was doable because brewer’s yeast, a single-cell organism, has been studied since the 1850s as a basic biological system. “It was convenient because we had a lot of knowledge about cell biology that could be brought to the table,” Ideker says. “We actually know an enormous amount about the structure of a yeast cell.”
So his team mapped the layers of a neural network to the components of a yeast cell, starting with the most microscopic elements (the nucleotides that make up its DNA), moving upward to larger structures such as ribosomes (which take instructions from the DNA and make proteins), and finally to organelles like the mitochondrion and nucleus (which run the cell’s operations). Overall, their neural network, which they call DCell, makes use of 2,526 subsystems from the yeast cell.
DCell allows researchers to change a cell’s DNA (its genetic code) and see how those changes ripple upward to change its biological processes, and subsequent to that, cell growth and reproduction. Its training data set consisted of several million examples of genetic mutations in real yeast cells, paired with information about the results of those mutations.
The researchers found that DCell could use its simulated yeast to accurately predict cell growth. And since it’s a “visible” neural network, the researchers could see the cellular mechanisms that were altered when they messed around with the DNA.
This transparency means that DCell could potentially be used for in silico studies of cells, obviating the need for expensive and time-consuming lab experiments. If the researchers can figure out how to model not just a simple yeast cell but also complex human cells, the effects could be dramatic. “If you could construct a whole working model of a human cell and run simulations on it,” says Ideker, “that would utterly revolutionize precision medicine and drug development.”
Cancer is the most obvious disease to study, because each cancer patient’s tumor cells contain a unique mix of mutations. “You could boot up the model with the patient’s genome and mutations, and it would tell you how quickly those cells will grow, and how aggressive that cancer is,” Ideker says.
What’s more, pharma companies searching for new cancer drugs use cell growth as the metric of success or failure. They look at a multitude of molecules that turn different genes on or off, asking for each: Does this potential drug cause the tumor cell to stop multiplying? With billions of dollars going to R&D for cancer drugs, an in silico shortcut has clear appeal.
Upgrading from yeast to human cells won’t be an easy task. Researchers need to gather enough information about human patients to form a training data set for a neural network—they’ll need millions of records that include both patients’ genetic profiles and their health outcomes. But that data will accumulate fairly quickly, Ideker predicts. “There’s a ton of attention going into sequencing patient genomes,” he says.
The trickier part is gathering the knowledge of how a human cancer cell works, so the neural network can be mapped to its component parts. Ideker is part of a consortium called the Cancer Cell Map Initiative that aims to help with this challenge. Cataloging a cancer cell’s biological processes is tough because the mutations don’t only switch cellular functions on and off, they can also dial them up or down, and can act in concert in complicated ways.
Still, Ideker is hopeful that he can employ a machine learning technique called transfer learning to get from a neural network that models yeast cells to one that models human cells. “Once you’ve built a system that recognizes cats, you don’t need to retrain the whole neural network to recognize squirrels,” he says.