Impact
DeepMind AlphaFold
AlphaFold predicts 3D models of protein structures and is helping accelerate biological research
AlphaFold
AlphaFold is a deep learning system that predicts a protein’s 3D structure from its amino acid sequence. OptimalAI's David Jones, worked with Google DeepMind to create the system by initially exposing it to 100,000 established protein sequences and structures.
Before AlphaFold, the conventional methods of protein structure determination entailed arduous and time-intensive experiments that spanned years and incurred substantial costs. The latest AlphaFold system predicts protein shapes with atomic precision in a matter of minutes.
In July of 2021, DeepMind made AlphaFold an open source code, available for free, to the whole world. It is anticipated that it will revolutionise our understanding of biology.
In 2024, Google DeepMind's Sir Demis Hassabis and Dr. John Jumper were co-awarded the 2024 Nobel Prize in Chemistry for their work developing AlphaFold.
Collaboration and Mentorship
OptimalAI's David Jones played a significant role in the development of AlphaFold. As the creator of widely used protein structure prediction tools like PSIPRED and DISOPRED, Jones' work directly influenced AlphaFold's design and accuracy.
Leveraging his deep expertise in bioinformatics and protein folding, Jones worked closely alongside the DeepMind team to develop and refine AlphaFold’s algorithms. His contributions were instrumental in AlphaFold's groundbreaking success, solving a longstanding challenge in molecular biology.
What are Proteins?
The Protein Folding Problem
Proteins showcase their 3D structure through a process of folding, driven by attraction and repulsion among the 20 amino acid types. This intricate folding creates the distinctive curls, loops, and pleats defining a protein's configuration. Each protein possesses a distinct three-dimensional configuration, which governs its functionality and purpose. However, determining the precise structure of a protein has historically been a costly and time-intensive endeavor.
For decades, researchers have pursued a reliable means to deduce a protein's structure solely from its amino acid sequence. This is known as the protein-folding problem. Before AlphaFold, Scientists had only managed to scrutinize the three-dimensional arrangement of a minute fraction of known proteins. Predicting the structures of millions of uncharted proteins would aid in combating diseases, expediting the discovery of novel medicines and shed light on the workings of life itself.
The Algorithm
AlphaFold uses a form of deep learning called an attention-based neural network system, that is trained end-to-end. This means that all of the different steps of the process are simultaneously trained instead of sequentially. This gives more flexibility to the network, allowing it to dynamically learn interactions between non-neighboring nodes. AlphaFold also uses evolutionarily related protein sequences, multiple sequence alignment, and a representation of amino acid residue pairs to refine its predictions.
The overall training was conducted on a processing power of between 100 and 200 GPUs. Training the system on the hardware took weeks, after which the program would take a matter of days to converge for each structure. The program was trained on over 170,000 proteins from a public repository of protein sequences and structures. AlphaFold offers two reliability metrics which provide confidence estimates in the predicted protein structure. This tells scientists how accurate the predictions are likely to be.
At CASP14 (2020), the latest iteration of AlphaFold was introduced to the protein-folding research community and tested against real experimental data. It attained an accuracy level that was deemed to address the protein structure prediction problem.
Professor Ewan Birney
Deputy Director General, EMBL and Director, EMBL-EBI
AlphaFold Protein Structure Database
In 2021, DeepMind and EMBL-EBI launched the AlphaFold Protein Structure Database and made it free and openly available to the scientific community. The database contains over 200 million predicted protein structures, covering almost every organism on Earth that has had its genome sequenced. This includes the entire human proteome, plants, bacteria, animals, and other organisms, opening up new avenues of research across the life sciences that will have an impact on global challenges, including sustainability, food insecurity, and neglected diseases.
Subsequent updates have seen the addition of UniProtKB/SwissProt and 27 new proteomes, 17 of which represent neglected tropical diseases that continue to devastate the lives of more than 1 billion people globally. AlphaFold has also shown impact in areas such as improving our ability to fight plastic pollution, gain insight into Parkinson’s disease, increase the health of honey bees, understand how ice forms, tackle neglected diseases such as Chagas disease and Leishmaniasis, and explore human evolution.
In 2024, Google DeepMind launched AlphaFold Server, a free research tool powered by AlphaFold 3. AlphaFold Server is the most accurate tool in the world for predicting how proteins interact with other molecules throughout the cell. With just a few clicks on a single platform, biologists can generate molecular complexes – regardless of their access to computational resources or their expertise in machine learning.