By Audrey Ko
Proteins are macromolecules that are essential for many biological functions within our body. The specific function of a protein depends on its unique 3D structure which is the result of folding of a chain or chains of amino acids. For the past decades, researchers have developed different experimental techniques for determining the 3D structures of proteins, such as X-ray crystallography and Nuclear Magnetic Resonance (NMR) spectroscopy (Alberts et al., 2002). However, these methods are very costly and time-consuming (Deepmind, 2020a). Therefore, researchers are turning to alternative methods for studying the shapes of proteins, for instance, by using computational methods.
Naturally, one might consider searching over all the possible folding configurations of a protein, given its sequences of amino acids, but this is in fact impractical. As discussed by Levinthal (1969), the time required by such a random search to reach the correct 3D structure would be horrendously long while the proteins themselves fold into the native 3D structure spontaneously in the timescale of milliseconds. This “Levinthal paradox” explains why a random search would not be useful for this problem and gives rise to one area in the bigger “protein folding problem” of how one can predict the native structure of a protein from its constituent amino acids.
The answers to the “recipe” of protein folding remained open for many years until last year when Deepmind’s deep learning based AlphaFold 2 model appeared. In the 14th Critical Assessment of Protein Structure Prediction (CASP14), a global biannual competition aimed to
advance the study and progress in this field, AlphaFold 2 was ranked first with an outstanding performance in correctly predicting protein structures (University of California, Davis, 2020). Crucially, in the competition, CASP chooses protein structures that have very recently been experimentally determined as targets for testing different predictive models and the predictions would then be compared with actual experimental data (The AlphaFold Team, 2020). During the CASP14 event, AlphaFold 2 achieved even more than being the top among around 100 participants: it obtained an overall median global distance test (GDT) score of 92.4 across all targets, outrunning its predecessor, AlphaFold, and all other competitors in this assessment (The AlphaFold Team, 2020). Moreover, a score above 90 is informally considered as competitive to the experimental results since this is equivalent to an average error at the length scale of around the width of an atom (The AlphaFold Team, 2020). Such a high score marks a breakthrough in the real-world application of computational predictions of protein structure and ultimately the resolution of the protein folding problem. In addition, Deepmind has only used 16 3rd generation tensor processing units (TPUv3), which is roughly equivalent to ~100-200 graphic processing units (GPU), and spent a few weeks to give the predictions. This is in stark contrast to the costly and year-long traditional experimental techniques previously seen.
To arrive at such a success, Deepmind improved the original deep learning architecture in AlphaFold 2’s predecessor, AlphaFold, which was also ranked first in CASP13 (University of California, Davis, 2020).
They created an attention-based neural network system, trained end-to-end, to interpret the structure of folded proteins (The AlphaFold Team, 2020). They also used evolutionarily related sequences, Multiple Sequence Alignment (MSA), and many other treatments and computational techniques. Through iterations and training, the model eventually developed into the now-known AlphaFold 2, with additional feature of predicting the reliability of each part of its prediction of the structure (The AlphaFold Team, 2020). However, as Deepmind has not yet published their method paper in a peer-reviewed journal, the exact details of implementations and developments are still unknown.
While the results Deepmind achieved are very encouraging, there are a few concerns regarding its actual application. Firstly, AlphaFold 2 has only scored a median score above 90, with scores for some targets below 90, undermining its application as a generic model for predicting any protein structures. Secondly, it remains unclear if AlphaFold 2 can also determine the structural outcome that result from some unknown folding mechanisms, that is, if it would suffer from overfitting in statistical terms. Furthermore, AlphaFold 2 only resolves the prediction puzzle in the bigger “protein folding problem”. It is not enough to simply predict what a protein folds into, but equally, if not more, important to understand how it folds and what functions it can exhibit to finally to solve the protein folding problem in full. Like the details of implementations and developments, many of these concerns also remain unanswered and await further explanations from Deepmind’s paper that is under peer review.
Despite the reservations one may have, AlphaFold 2 has been involved in successful applications. It was used by Deepmind (2020b) to predict structures for some of the proteins in SARS-CoV-2. In particular, it was confirmed that it provided the correct prediction for the spike protein structure shared in the Protein Data Bank (PDB), marking yet another potential use of the model in helping researchers further understand viral proteins. In summary, AlphaFold 2 presents a promising future of using the model to predict the 3D structures of proteins and a major milestone in resolving the protein folding problem completely. While there are still debates and critics over its limitations, it remains as one of the most prospective tools in structural bioinformatics.
Alberts, B., Johnson, A., Lewis, J., Morgan, D., Raff, M., Roberts, K. & Walter, P. (2002) Molecular Biology of the Cell. 4th edition. Garland Science, New York.
Deepmind. (2020a) Computational predictions of protein structures associated with COVID-19. Available from: https://deepmind.com/research/open-source/computational-predictions-of-protein-structures-associated-with-COVID-19 [Accessed 20th June 2020].
Deepmind. (2020b) AlphaFold: Using AI for scientific discovery. Available from: https://deepmind.com/blog/article/AlphaFold-Using-AI-for-scientific-discovery [Accessed 20th June 2021].
Levinthal, C. (1969) How to Fold Graciously. Mossbauer Spectroscopy in Biological Systems: Proceedings of a meeting held at Allerton House. Available from: chrome-extension://oemmndcbldboiebfnladdacbdfmadadm/https://www.cc.gatech.edu/~turk/bio_sim/articles/proteins_levinthal_1969.pdf [Accessed 20th June 2021].
The AlphaFold Team. (2020) AlphaFold: a solution to a 50-year-old grand challenge in biology. Available from: https://deepmind.com/blog/article/alphafold-a-solution-to-a-50-year-old-grand-challenge-in-biology [Accessed 20th June 2021].
University of California, Davis. (2020) Home – Prediction Center. Available from: https://predictioncenter.org [Accessed 20th June 2021].