How can graphs help us solve problems underpinning biology?

By Anjali Samra

To understand biological processes, knowledge about molecules and the relationships between them is required. For example, graphs can be used to model protein to protein binding interactions within different molecular pathways and so they can be used to investigate the relationships between different structures and their properties. This is particularly important in the analysis of disease pathways and in drug discovery (Sun et al., 2020; Yang et al., 2020). 

Graphs are a great way to model entities themselves and their interactions (Muzio et al., 2021). Graph neural networks (GNN) are a type of neural network that are based on the graph structure. They can be applied to many challenges that are faced in the bioinformatics domain, including predicting protein function based on its structure, how a drug may affect biological pathways as well as capturing gene expression relationships (Eetemadi and Tagkopoulos, 2019; Muzio et al., 2021). Any biological entity can be represented as a node and the relationships between them as edges within a graph (Gramatica et al., 2014). 

Deep learning, such as GNNS are a subset of machine learning in artificial intelligence; it is composed of multiple layers that are able to model non-linear dependencies and so the networks are able to conduct unsupervised learning from non-labelled or unstructured data. They have been applied to many situations including image analysis (Cao et al., 2018). This ability to identify complex patterns from vast amounts of data unlike humans, makes it best suited to bioinformatics where the data is noisy and on multiple scales as such is the case with systems biology (Min et al., 2017). Deep learning can offer strong performance and scalability for time-consuming tasks (Cao et al., 2018). This is ideal for large biological datasets like those generated by large consortia. For example, The Cancer Genome Atlas has multi-omics data from over 30,000 patients across different types of cancer (Lim et al., 2019; Tomczak et al., 2015). 

Gene regulatory networks are the mechanisms that regulate gene expression and thereby produce proteins from the activated genes. This regulatory process occurs at many stages of protein synthesis from transcription, translation and post-translational modification (Emmert-Streib et al., 2014). Graphs can be used to identify the link between genes and so which genes are regulating the expression of others, such as enhancer to gene relationships. Other networks like protein to protein interaction, drug to drug interaction as well as metabolic networks can also be modelled (Sun et al., 2020)(Emmert-Streib et al., 2014).

Recent work in protein structure prediction has focused on protein folding i.e. the 3D structure of the protein from its genetic sequence. Alphafold is an amazing method that has the ability to predict the distance between amino acids and the angles between those bonds that connect the amino acids. So, the algorithm can estimate whether amino acids are near each other and how accurate the estimations are when inferring the 3D structure (Senior et al., 2020).

Deep learning can also be applied to chemical compound screens to find hits to targets and in studying drug properties, such as absorption, distribution, metabolism and excretion. There is large interest in this area as drug development is a highly laborious and expensive process (David et al., 2019). Moreover, drug to drug interaction networks are important in polypharmacy, where the treatment of complex disease mechanisms requires multiple drugs and so these networks are important in deciphering potential toxicity (Ryu et al., 2018). Deep learning is important in these situations where doing such tasks manually is time-consuming or just impossible to do so (Cao et al., 2018).

An interesting application of GNN has been in the mapping of anti-cancer compounds in foods. Recent data has indicated that as much as 30-40% of cancers could be prevented by dietary changes. Foods rich in molecules like flavonoids have been implicated in the modulation of cell signaling pathways within tumour cells and so these molecules can inhibit angiogenesis and metastasis. So, the GNN predicts similar molecules from learning molecular networks that are already targeted by current anti-cancer drugs (Veselkov et al., 2019). Another intriguing concept that GNN has been applied to is in learning causal networks of disease with mendelian randomization. The principle of mendelian randomization is based on using genetic variation to investigate the causal relationships between risk factors and disease in large epidemiological studies. Since it is hard to determine causality from genetic data directly, it is even more difficult to infer a causal regulatory network of genes. So, this algorithm combines mendelian randomization with PC, an algorithm used for learning causal graphs. This method circumvents the problem of most data pointing to association and gets to the bottom of causality of disease (Badsha and Fu, 2019).

Although the applications and benefits have been described so far, deep learning also has its own limitations. For example, its lack of interpretability is one of the main caveats to employing deep learning methods. The black box nature of this area is a big challenge, especially since in biology we are interested in understanding the mechanism behind biological processes rather than just predictions of such networks. Moreover, they cannot currently be applied within a clinical context when making medical decisions due to the lack of interpretability (Nicholson Price, 2018). Other issues are that large datasets are required and unfortunately biological experiments suffer from quality issues (Min et al., 2017). 

In conclusion, GNN have proven to have even better results than classical machine learning techniques, highlighting its potential as a machine learning model and its application on biological problems. It is an exciting field to be in with vast applications in bioinformatics and answering key questions that underly disease (Min et al., 2017). Despite the challenges deep learning faces, it has seen amazing results in proteomics, network analysis and drug discovery. The amount of data from high-throughput experiments will continue to expand, giving many opportunities for deep learning to solve key problems underpinning biology (Min et al., 2017; Ryu et al., 2018; Senior et al., 2020). 


Sun, M., Zhao, S., Gilvary, C., Elemento, O., Zhou, J. & Wang, F. (2020). Graph convolutional networks for computational drug development and discovery. Briefings in Bioinformatics. 23(3), 919-935. Available at:

Yang, F., Fan, K., Song, D. & Lin, H. (2020). Graph-based prediction of protein-protein interactions with attributed signed graph embedding. BMC Bioinformatics. 21(1). 1-16. Available at:

Muzio, G., O’Bray, L. & Borgwardt, K. (2021). Biological network analysis with deep learning. Briefings in Bioinformatics. 22 (2), 1515-1530. Available at:

Eetemadi, A. & Tagkopoulos. (2019). Genetic neural networks: an artificial neural network architecture for capturing gene expression relationships. Bioinformatics. 35(13), 2226-2234. Available at:

Gramatica, R., Matteo, T. D., Giorgetti, S., Barbiani, M., Bevec, D. & Aste, T. (2014). Graph theory enables drug repurposing-How a mathematical model can drive the discovery of hidden mechanisms of action. PLOS ONE. 9(1), 84912. Available at:

Cao, C., Liu, F., Tan, H., Song, D., Shu, W., Li, W., Zhou, Y., Bo, X. & Xie, X. (2018). Deep learning and its applications in biomedicine. Genomics proteomics bioinformatics. 16(1), 17-32. Available at: doi: 10.1016/j.gpb.2017.07.003

Min, S., Lee, B. Yoon, S. (2017). Deep learning in bioinformatics. Briefings in Bioinformatics. 18(5), 851-869. Available at:

Lim, S. B., Chua, L. K., Yeong, J. P. S., Tan, S. J., Lim, W. & Lim, C. T. (2019). Pan-cancer analysis connects tumor matrisome to immune response. Npj Precision oncology. 3(1), 1-9. Available at:

Tomczak, K., Czerwinska. & Wiznerowicz, M. (2015). The cancer genome atlas (TCGA): an immeasurable source of knowledge. Contemp Oncol (Pozn). 19(1A), 68-67. Available at: doi: 10.5114/wo.2014.47136

Emmert-Streib, F., Dehmer, M. & Haibe-Kains, B. (2014). Gene regulatory networks and their applications: understanding biological and medical problems in terms of networks. Frontiers in cell and developmental biology. 2, 38. Available at:

Senior, A. W., Evans, R., Jumper, J., Kirkpatrick, J., Sifre, L., Green, T., Qin, C., Zidek, A, Nelson, A. W. R., Bridgland, A., Penedones, H., Peterson, S., Simonyan, K., Crossan, S., Kohli, P., Jones, D. T., Silver, D., Kavukcuoglu. & Hassabis, D. (2020). Improved protein structure prediction using potentials from deep learning. Nature. 55, 706-710. Available at:

David, L., Arus-Pou, J., Karlsson, J., Engkvist, O., Bjerrum, E. J., Kogej, T., Kriegl, J. M. Beck, B. & Chen, H. (2019). Applications of deep-learning in exploiting large-scale and heterogenous compound data in industrial pharmaceutical research. Frontiers in pharmacology. 10, 1303. Available at:

Ryu, J. Y., Kim, H. U. & Lee, S. Y. (2018). Deep learning imrpoves prediction of drug-drug and drug-food interactions. PNAS. 115(18), 4304-4311. Available at:

Veselkov, K., Gonzalez, G., Aljifri, S., Galea, D., Mirnezami, R., Youssef, J., Bronsein, M. & Laponogov, I. (2019). Hyperfoods: Machine intelligent mapping of cancer-beating molecules in foods. Scientific reports. 9, 9237. Available at:

Badsha, B. & Fu, A. Q. (2019). Learning causal biological networks with principle of mendelian randomisation. Frontiers in genetics. 10, 460 . Available at:

Nicholson-Price, W. (2019). Big data and black-box medical algorithms. Science translational medicine. 10(471), 5333 . Available at: doi: 10.1126/scitranslmed.aao5333

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s