Life beyond four bases

By Sashini Ranawana

For over 3.8 billion years, Life has evolved within the constraints of four bases. Four seemingly unremarkable  nucleotides, which have succeeded in giving rise to every single organism currently known to man. Together they constitute the genetic code: a language system which plays a critical role in the  fundamental mechanisms of transcription and translation. Within the parameters of this system, only 64 mRNA codons are possible, enough to code for the 20 standard canonical amino acids found in all proteins.  There have recently been notable efforts to widen the extent of our genetic alphabet. These experiments have shed more light on the unconventional methods required to produce and manipulate chemical bases. They have also forced us to consider the scope for synthetic DNA in the field of molecular genetics. 

Initial efforts to create DNA bases focused primarily on mimicking the structure of pre-existing nucleotides, rather than on their intermolecular chemical interactions. In 2014, a team led by Professor Floyd Romesberg at the Scripps Research Institute demonstrated this by producing two complementary X and Y nucleotides (denoted by dNaM and dTPT3 respectively), which relied purely on hydrophobic forces, instead of the usual hydrogen bonds, between constituent atoms (Malyshev and Romesberg, 2015). Subsequent experiments involved the development of Escherichia coli-based semi-synthetic organisms (SSOs), as a method to test the storage and information transfer systems of three base-pair DNA. The YZ4 bacterial chromosome was modified with genes encoding the nucleoside triphosphate transporter PtNTT2 and the endonuclease enzyme Cas9, while two dNaM-dTPT3 pairs were introduced into a single bacterial plasmid. The functionality of DNA replication machinery was confirmed when, after being cultured in a medium containing free dNaM and dTPT3 triphosphates, bacterial populations continued to double over the measured time period. This ability of synthetic DNA to replicate uninterrupted was aided by the PtNTT2 protein, which allowed triphosphates to move into the bacterial cell. It was further facilitated by the Cas9 enzyme, which rapidly degraded plasmids lacking the unnatural X-Y base pair (Zhang, 2017a). These promising results set the stage for further research into the information flow of synthetic genetic systems. 

The next question they considered was whether information could be transferred from DNA containing hydrophobic bases to mRNA, and consequently protein. The superfolder green fluorescence protein (sfGFP), with a gene modification from TAC to AXC at triplet 151, was used to investigate this mechanism. This gene, along with a gene encoding tRNAPyl, with a complementary GYT sequence at its anticodon, was intercalated into a plasmid and inserted into a YZ3 bacterial cell. The hope was that T7 RNA polymerase could differentiate and transcribe the synthetic base, and that tRNAPyl could incorporate the unnatural amino acid PrK at the corresponding position in the polypeptide chain. Fluorescence levels indicated that the transcription and translation machinery of the SSO worked to create mRNA containing the unnatural base, and sfGFP molecules containing the non-proteinogenic amino acid (Zhang, 2017b). We now have proof that the central dogma extends to a genome with more than four bases. 

The number of potential applications for hydrophobic base analogues is, however, limited, since continuous rows of X and Y base pairs on a DNA molecule are prevented by the lack of hydrogen bonds between them. When arranged in a line without any standard A, C, T or G nucleotides interspersed amongst them, these unnatural bases collapse upon each other and alter the double helical structure of the DNA molecule (Williams, 2017). Synthetic bases which use this type of bonding, therefore, need to be arranged in a specific order. The highly variable genetic code prohibits this. Remarkably, however, a study conducted by Professor Steven Benner’s team in 2019 detailed the attempts to create synthetic bases which, when incorporated into DNA, circumvent this structural problem. Thus, the creation of “Hachimoji’ DNA has led to the effective addition of four new letters to the genetic alphabet. These two new base-pairs, S-B and P-Z, form hydrogen bonds like standard pairs, and can therefore be introduced in uninterrupted stretches. Analytical tests performed on three molecules containing these unnatural and natural bases confirmed that they assumed a B-DNA configuration, with other measurements, such as the width and number of bases per turn, also falling within the range of normal DNA (Hoshika, 2019). 

One key intention of the study was to determine whether the information contained in Hachimoji DNA could be transferred in a complementary manner to RNA. This revolved, almost entirely, around finding an appropriate polymerase enzyme that could recognize all the synthetic RNA nucleotides. To test this, four DNA molecules containing one unnatural S, B, P or Z base were used as templates. When provided with free STP, BTP, PTP and ZTP monomers, the enzyme T7 RNA polymerase was only unable to incorporate STP into the RNA molecule. An alternate version of this enzyme, ‘FAL’, showed high fidelity in adding all the unnatural ribonucleotides opposite their complementary deoxyribonucleotides (Hoshika, 2019). In most systems, the last biomolecule in the flow of genetic information is a protein. By producing improved nucleic acids with a greater possible combination of bases, the need for proteins could be reduced. Such a change could potentially be accompanied by an increased reliance on DNA and RNA molecules with the ability to directly influence each other and their biological environment (Singer, 2015). This could have interesting consequences for the productivity and evolution of future organisms. 

What possibilities are opened up by a more variable genetic code? With more nucleotides, there is a greater capacity to store information from biological and non-biological systems. Experiments have also shown how DNA sequences with Z and P bases have a greater ability to adhere to liver cancer cells (Zhang, 2015). This has interesting implications, especially when considering the growing possibility of creating DNA or RNA molecules able to carry out direct functions such as reaction catalysis, drug delivery, and cellular marking. It is remarkable what organisms have been able to do with combinations of only four bases, and it will be beguiling to see what can be achieved when even more are added to the mix. 


Malyshev, D. and Romesberg, F. E., (2015). The Expanded Genetic Alphabet. Angewandte Chemie International Edition [online]. 54(41), 11930-11944. [Accessed 28 August 2020]. Available from: doi: 10.1002/anie.201502890

Hoshika, S. et al., (2019). Hachimoji DNA and RNA: A genetic system with eight building blocks. Science [online]. 363(6429), 884-887. [Accessed 4 September 2020]. Available from: doi: 10.1126/science.aat0971

Singer, E., (2015). New Letters Added to the Genetic Alphabet. Quanta Magazine [online]. [Accessed 3 September 2020]. Available from: 

Williams, R., (2017). Six-Letter DNA Alphabet Produces Proteins in Cells. The Scientist [online]. [Accessed 4 September 2020]. Available from: 

Zhang, L. et al., (2015). Evolution of Functional six-nucleotide DNA. Journal of the American Chemical Society [online]. 137(21), 6734-6737. [Accessed 5 September 2020]. Available from: doi: 10.1021/jacs.5b02251

Zhang, Y. et al., (2017a). A semisynthetic organism engineered for the stable expansion of the genetic alphabet. Proceedings of the National Academy of Sciences [online]. 114(6), 1317-1322. [Accessed 1 September 2020]. Available from: doi: 10.1073/pnas.1616443114

Zhang, Y. et al., (2017b). A semi-synthetic organism that stores and retrieves increased genetic information. Nature [online]. 551(7682), 644-647. [Accessed 2 September 2020]. Available from: doi: 10.1038/nature24659

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s