DNA Encoded Data: A Replacement for Hard Drives?

By Nick Bitterlich

We all own that one hard drive that stores our fondest memories in an intricate network of chips. It writes and rewrites data multiple times whenever we save or download files. It seems USBs and hard drives can’t have enough storage with the tremendous amounts of data generated on a daily basis. The first hard drive developed in 1954 weighed over a ton and could only carry a mere 5 MB (Computer History Museum, n.d.). Meanwhile, the iPhone 12 carries up to a terabyte in a slick body weighing a few hundred grams. If this trend continuous, future storage options will be tiny. In fact, they might be microscopic; the size of a 0.1 mm cell.

Scientists have proposed that the 1s and 0s of binary code used to store files could be converted into the As, Ts, Cs and Gs of DNA. Albeit synthesising (“writing”) and sequencing (“reading”) DNA would be tediously slow, and mistakes would be made roughly every 100 nucleotides, it might just be the answer to storing data sustainably long term. DNA is not only invisible to the naked eye but is furthermore supercoiled to create an incredibly dense “databank” that acts as instructions for the synthesis of human proteins. In the year 2040, it is predicted that our consumption of microchip-grade silicon will outweigh its supply fifty-fold (Zhirnov et al., 2016). Additionally, a hard drive can only store data reliably for 5-10 years before its quality deteriorates. In comparison, bacterial genetics studies suggest E. coli could increase data retention tenfold while increasing data density (bits per cm3) thousandfold (Extance, 2016). This would imply that instead of depleting the supply of silicon, the world’s data storage needs could be fulfilled by a mere kilogram of DNA. 

Fascinating research into this utopian-sounding idea has been ongoing since 2011, when a Harvard team successfully proved this concept. Then in 2013, Goldman and Birney encoded 740 kb of hard-disk storage into DNA code and subsequently reconstructed the files (an image and audio file among others) with 100% accuracy (Goldman et al., 2016). In 2016, a coalition between Microsoft and researchers at the University of Washington managed a staggering 200 MB (Extance, 2016).  

The synthesis of data storing DNA is relatively straightforward. As is the case with computers, information and text must first be converted to binary code. Adenine or cytosine can then represent a 0 whereas guanine or thymine read as 1. This could avoid the repetition of Cs and Gs causing stem-loops or pseudo-knots that could induce errors. Alternatively, the binary file can be converted into “trits” or three-digited code (0s, 1s and 2s). Each number then corresponds to one of the three distinct nucleotides that can be added to the sequence and are different from the one just used. These trits then act as instructions for the synthesis of DNA fragments with sufficient overlap to ensure multiple copies of the code are created, decreasing the error rate due to redundancy (Extance, 2016). Microsoft has recently managed to partially automate this process by using microfluidic pumps to feed DNA into a MinION reader (Langston, 2019). Before the data is read, the DNA is polymerase chain reaction (PCR)-amplified with TAQ polymerase and a Bsa-I restriction site is added to aid ligation (Takahashi et al, 2019). 

Despite the supposed simplicity, reading DNA on a mass scale as would be required to store modern files is no easy task. Using conventional sequencing methods, all data would have to be retrieved at once, a cumbersome process. However, the University of Washington team devised a new method that enables researchers to amplify certain snippets using PCR, which can then be extracted and read. Milenkovic et al further improved this random-access approach through the inclusion of “address” sequences at both ends of the DNA-encoded file (Extance, 2016). This way, retrieving data would be just as easy identifying a product in the supermarket by its QR code. 

The cost of writing DNA remains incredibly high and without proper funding, this future will be nothing more than a dream. However, the cost of sequencing the human genome saw a 2-million-fold cost reduction, and will the synthesis of DNA will likely follow a similar trend. Thus, this means of storing data is likely to be a viable option in the near future. 

References:

Computer History Museum. (n.d.). 1956: First commercial hard disk drive shipped | The Storage Engine | Computer History Museum. [online] Available at: https://www.computerhistory.org/storageengine/first-commercial-hard-disk-drive-shipped/.

Extance, A. (2016). How DNA could store all the world’s data. Nature, [online] 537(7618), pp.22–24. Available at: https://www.nature.com/news/how-dna-could-store-all-the-world-s-data-1.20496 [Accessed 27 Nov. 2019].

Goldman, N., Bertone, P., Chen, S., Dessimoz, C., LeProust, E.M., Sipos, B. and Birney, E. (2013). Towards practical, high-capacity, low-maintenance information storage in synthesized DNA. Nature, [online] 494(7435), pp.77–80. Available at: https://www.nature.com/articles/nature11875 [Accessed 30 Mar. 2021].

Langston, J. (2019). Microsoft, UW demonstrate first fully automated DNA data storage. [online] Microsoft. Available at: https://news.microsoft.com/innovation-stories/hello-data-dna-storage/ [Accessed 22 Aug. 2019].

Stephenson, A., Willsey, M., McBride, J., Newman, S., Nguyen, B., Takahashi, C., Strauss, K. and Ceze, L. (2020). PurpleDrop: A Digital Microfluidics-Based Platform for Hybrid Molecular-Electronics Applications. IEEE Micro, [online] 40(5), pp.76–86. Available at: https://ieeexplore.ieee.org/document/9130165 [Accessed 30 Mar. 2021].

Takahashi, C.N., Nguyen, B.H., Strauss, K. and Ceze, L. (2019). Demonstration of End-to-End Automation of DNA Data Storage. Scientific Reports, [online] 9(1), p.4998. Available at: https://www.nature.com/articles/s41598-019-41228-8 [Accessed 30 Mar. 2021].

Zhirnov, V., Zadegan, R.M., Sandhu, G.S., Church, G.M. and Hughes, W.L. (2016). Nucleic acid memory. Nature Materials, [online] 15(4), pp.366–370. Available at: https://pubmed.ncbi.nlm.nih.gov/27005909/ [Accessed 30 Mar. 2021].

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s