By Denny Yang Tze Te
1977 marked a major breakthrough in the lab of Frederick Sanger when he and his colleagues introduced the “dideoxy chain-termination” method to the world. This is famously known as Sanger sequencing, an example of ‘first-generation’ sequencing. The first step begins with double-stranded DNA denaturation into single-stranded DNAs and each single-stranded DNAs are then annealed to oligonucleotide primer. The primers of single-stranded DNAs are then elongated with a mixture of dNTPs and a small quantity of chain terminating ddNTPs. The primer sequence extends with dNTPs until a ddNTP attaches at 3’ end of a dNTP. Since dNTP and ddNTP have equal chance of attaching to the elongating strand, each sequence will terminate at varying lengths. The key to this sequencing is attaching distinctive fluorescent markers to each ddNTPs (e.g. A is tagged by green fluorescence proteins, G by red, T by black and C by blue). Sequence of a DNA can be determined when fluorescent intensity is detected when varying length of terminating strands are separated by gel electrophoresis. This technology is the first stepping stone that led the world to understand a little bit more about the central dogma of biology. (Gomes & Korf, 2018)
Parallel to the development of ‘first-generation’ sequencing, was also a rising interest of sequencing the transcriptome. The transcriptome refers to all the messenger RNA (mRNA) expressed or transcribed in a cell or organism. (Srivastava et al., 2019). Sequencing DNA only tells us about the genetic profile of an organism, but sequencing mRNA tells us what the cells are actively expressing. In other words, sequencing RNA at a certain timepoint of a multicellular organism’s lifetime tells us what the constituent cells are actively producing and doing. This technology answers some important questions in biology: How do specialized cells decide to express a certain phenotypic property? Or simply a question of how cells in a multicellular organism are different from each other?
The first landmark of RNA sequencing began in Walter Fier’s lab at University of Ghent, Belgium. (Fiers et al., 1976) They successfully sequenced bacteriophage MS2 RNA in viral genome, a 3,569 nucleotide long RNA. The process starts by radioactively label MS2 RNA with phosphorus-32 and partially digest it by single strand-specific ribonuclease at 0 degrees Celsius. The digest would then be fractioned on a polyacrylamide slab gel. Individual bands on the gel would then be further separated by two-dimensional gel electrophoresis. These pure fragments of RNA are then sequenced by methodology described by Frederick Sanger and his team. Viral RNA genomes are relatively stable, so complementary DNA (cDNA) generation is not required before sequencing. Modern RNA sequencing methods are commonly used to sequence mRNAs of cells, so a stable cDNA generation from these transient mRNAs are a crucial step. All in all, this important study by Walter Fier and his team was a large milestone for bioinformatics in gives us insight and interest towards transcriptomic sequencing.
One transcriptomic analysing technique is called RNA-seq (abbreviation for ‘RNA sequencing’) which relies on next generation sequencing to reveal all RNA expressed in a biological sample at a given timepoint. There are two major existing types of RNA-seq namely bulk RNA-seq and single cell RNA-seq (scRNA-seq). (Kiselev, 2021) Bulk RNA-seq measures the average expression level for each gene across a population of input cells sample. The input cell samples are often homogenous and used for comparative transcriptomics or to study tissue dynamics. scRNA-seq measures the distribution of expression levels for each gene across a heterogenous population of input cells. This single cell technology allows cell identification, disease profiling, understanding heterogeneity of cell responses and stochasticity of gene expression. scRNA-seq was first developed in 2009 by Fuchou Tang (Tang et al., 2009) and gained widespread popularity in the field of biology in the early 2010s such as bioinformatics and developmental biology.
scRNA-seq is built based on the premise of bulk RNA-seq, therefore the workflow for fundamental RNA-seq is still present. (CD Genomics, 2021) The process starts by library preparation, where individual cells from a heterogenous cell sample are isolated by microfluidics or microplate techniques where individual cells are trapped per well or using laser to isolate individual cells into microplate slots respectively. Final step of library preparation is stable reverse transcription of RNA to cDNA and each cDNA would be indexed with oligonucleotide barcodes. These labelled cDNA can then be sent to companies such as Helicos and Oxford Nanopore Technologies to sequence them directly using massively-parallel sequencing like next generation sequencing. From the sequencing result, clustering analysis can be performed to group cell types with similarly expressed gene together and ultimately unmask the heterogenous tissue complexity. This includes identification of number and transcriptomic signatures of various cell types in a given heterogenous sample. (Huh et al., 2019)
One striking example of scRNA-seq application was seen in the field of developmental biology investigating the differential gene expression in early human development. Magdalena Zernicka-Goetz and her team at University of Cambridge utilized scRNA-seq to characterize the transcriptomes of individual cells in early embryonic development. They identified Sox21 as the most differentially expressed gene between each cells at 4-cell stage embryo. Before scRNA-seq was introduced, these cells at 4-cell stage are morphologically similar and were thought to be homogenous. Sox21 is known to repress premature differentiation and its differential expression tells us that 4-cell stage embryos already exhibit differences in their cell fate and developmental potential. (Goolam et al., 2021)
Recent advancements in RNA sequencing technologies brings an extremely powerful tool across many biological disciplines. Especially scRNA-seq, we now can visualize and interrogates the differential gene expression at a single cell level. Future prospects of this technique is to use data-driven RNA network models to understand biological pathways in any given physiological condition. Thus this technology will continue to help the field of biology to understand differentiation and diversity; and even development of personalized medicine. (Ozsolak and Milos, 2010)
CD-Genomics. 2021. Biofluid Small RNA-Seq – CD Genomics. [online] Available at: <https://rna.cd-genomics.com/biofluid-small-rna-seq.html?msclkid=f4a6f8cb75c51a5b4af798666f26fe52&utm_source=bing&utm_medium=cpc&utm_campaign=Services&utm_term=Biofluid%20Small%20RNA-Seq%20price&utm_content=2-4-1%20Biofluid%20Small%20RNA-Seq> [Accessed 25 May 2021].
Fiers, W., Contreras, R., Duerinck, F., Haegeman, G., Iserentant, D., Merregaert, J., Min Jou, W., Molemans, F., Raeymaekers, A., Van den Berghe, A., Volckaert, G. and Ysebaert, M., 1976. Complete nucleotide sequence of bacteriophage MS2 RNA: primary and secondary structure of the replicase gene. Nature, 260(5551), pp.500-507.
Gomes, A. and Korf, B., 2018. Genetic Testing Techniques. Pediatric Cancer Genetics, [online] pp.47-64. Available at: <https://www.sciencedirect.com/science/article/pii/B9780323485555000053> [Accessed 25 May 2021].
Goolam, M., Scialdone, A., Graham, S., Macaulay, I., Jedrusik, A., Hupalowska, A., Voet, T., Marioni, J. and Zernicka-Goetz, M., 2021. Heterogeneity in Oct4 and Sox2 Targets Biases Cell Fate in 4-Cell Mouse Embryos. [online] Available at: <https://www.genelibs.com/article/article/ueditor/jsp/upload/file/20160727/1469589327668058777.pdf> [Accessed 25 May 2021].
Huh, R., Yang, Y., Jiang, Y., Shen, Y. and Li, Y., 2019. SAME-clustering: Single-cell Aggregated Clustering via Mixture Model Ensemble. Nucleic Acids Research, [online] 48(1), pp.86-95. Available at: <https://pubmed.ncbi.nlm.nih.gov/31777938/> [Accessed 25 May 2021].
Ozsolak, F. and Milos, P., 2010. RNA sequencing: advances, challenges and opportunities. Nature Reviews Genetics, [online] 12(2), pp.87-98. Available at: <https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3031867/> [Accessed 25 May 2021].
Srivastava, A., George, J. and Karuturi, R., 2019. Transcriptome Analysis. Encyclopedia of Bioinformatics and Computational Biology, [online] 3, pp.792-805. Available at: <https://www.sciencedirect.com/science/article/pii/B9780128096338201611> [Accessed 25 May 2021].
Tang, F., Barbacioru, C., Wang, Y., Nordman, E., Lee, C., Xu, N., Wang, X., Bodeau, J., Tuch, B., Siddiqui, A., Lao, K. and Surani, M., 2009. mRNA-Seq whole-transcriptome analysis of a single cell. Nature Methods, [online] 6(5), pp.377-382. Available at: <https://www.semanticscholar.org/paper/mRNA-Seq-whole-transcriptome-analysis-of-a-single-Tang-Barbacioru/538788ca293f01d4bbcf5dfceda1404b43064a27> [Accessed 25 May 2021].
Vladimir Kiselev (wikiselev), R., 2021. 2 Introduction to single-cell RNA-seq | Analysis of single cell RNA-seq data. [online] Scrnaseq-course.cog.sanger.ac.uk. Available at: <https://scrnaseq-course.cog.sanger.ac.uk/website/introduction-to-single-cell-rna-seq.html> [Accessed 25 May 2021].