From Sanger to Next Generation Sequencing
Next generation sequencing (NGS) technology has opened a wealth of data and vital information in diverse areas of biological science. This includes knowing our history or ancestry, predisposition to a disease, microorganisms causing epidemics, risks involved in prenatal and newborn screening and other applications in personalized medicine. From researchers to clinicians, the use of NGS technology is becoming a norm and all the more so, due to efficient data output, cost reduction and time efficiency. Major advancements are being carried out at a tremendous pace and Frederick Sanger needs mention here because the first breakthrough in sequencing started with Sanger sequencing and other sequencing methods soon followed after.
The basic principle of Sanger sequencing is the incorporation of dideoxynucleotides, which stop an extending chain. Different fragments are generated which can then be separated by size with capillary gel electrophoresis. This methodology can sequence up to 300 bases.
Meanwhile, the Human Genome Project (HGP) was initiated for understanding the genetics behind disease, cancer, and other variations along with its correlations. The aim was to sequence the genome within a considerable amount of time. The low throughput of Sanger (reading out a single fragment at a time) would make sequencing the human genome an impossible task. Despite that, the human genome project took 13 years to complete and with it there was a transition to next generation sequencing (multiple fragments at a time).
The graph below gives a representation of when the shift towards NGS occurred.
Figure 1: Graphical representation of First, Second, and Third Generation Sequencers (Adapted from Genetic analysis in neurology: the next 10 years.) (Pittman & Hardy, 2013)
Before we delve into the applications of NGS, it is essential to understand that the biggest contributor towards the sequencing data is library preparation. Whatever steps are taken for the samples to prepare it for sequencing will decide its fate later. The below steps roughly give an overview of each step and it is important to keep in mind that the steps will vary depending on the design of the experiment, platform where sequencing is done, and the goal at the end of the experiment.
Steps of DNA Library Preparation:
- Nucleic acid Isolation
The cells are lysed with the help of a homogenizer and are treated with detergents to remove cellular debris and contamination, leaving behind purified DNA. Depending on the source, the extraction protocol varies and commercial extraction kits provided by BioChain Institute, Inc. encompass a broad range of samples using high-quality reagents and cutting-edge technology. With highly qualified professionals and experienced handling of sophisticated samples, BioChain also offers extraction services.
The starting material is gDNA or cDNA. To read more about gDNA and cDNA, please refer to https://www.biochain.com/general/cdna-vs-genomic-dna/
The purity is checked using a UV spectrophotometer to determine the A260/A280 ratio, which is found to be 1.8-2 for pure nucleic acid samples. Gel electrophoresis is also performed to visualize the DNA when loading with a ladder of known concentration.
- DNA fragmentation
The basic criteria for all the NGS machines are that they accept fragments of specified dimension and selection of any of the following methods should ensure a correct fit for sequencing.
(i) Physical Fragmentation:
It involves force to shear DNA into a desired fragment size. The shearing produces 5’ and 3’ overhangs.
There are three types of shearing:
(a) Acoustic shearing (Covaris)
(ii) Chemical Fragmentation
This is typically performed for RNA. The RNA is heat digested using divalent cations such as Mn2+ or Zn2+. Later, the RNA is converted to cDNA (Head et al., 2014)
(iii) Enzymatic Fragmentation
(a) DNase 1 or Fragmentase – It is a 2-enzyme mixture producing overhangs and nicked fragments, ligated by E. coli DNA ligase (Knierim, Lucke, Schwarz, Schuelke, & Seelow, 2011)
(b) Transposase- The enzyme fragments and ligates adapters side-by-side. (Head et al., 2014)
- End Repair and Adapter Ligation
T4 DNA polymerase is required to fill the 5’ overhang and cut the 3’ end to generate blunt ends. Phosphorylation of the 5’ end is fulfilled by T4 polynucleotide kinase (PNK). A-tailing of the 3’ ends are achieved using either Taq DNA Polymerase or DNA polymerase I (Klenow Fragment), hence increasing the possibility of adapter ligation. (Head et al., 2014a)
The adapters, which are specific to each platform, are annealed to either end of the fragments with the help of T4 DNA Ligase. The suggested adapter: fragment ratio is 10:1. These, then take part in the subsequent sequencing reaction. Adapters are designed such that adapters dimers, which may reduce the sequencing quality, are not formed. (Quail, Swerdlow, & Turner, 2009)
At the end, fragments of DNA are obtained that have been tagged with adapters and undergo a purification step. This is what constitutes a “library”. BioChain’ s NGS library prep kits https://www.biochain.com/products/epigenetics-ngs/ngs-library-prep-kits/ have been designed keeping in mind both high and low quality gDNA or cDNA. The libraries are prepped with up-to-date methodologies. With ready-to-use reagents reducing time, researchers and scientists can focus on the other aspects of NGS workflow.
- Size Selection
Size selection may be done by running a gel manually and selecting the desired size DNA fragment library. Though now, automated gel-based methods are available with reduced time, contamination, and higher reproducibility.
The next popular gel-free size selection utilizes the selective binding of magnetic beads to the DNA fragments by changing the ratio of bead: DNA sample. This change in the ratio gives the flexibility of removing both the smaller and larger fragments of the desired target size range (double-sided size selection). It has an easy and efficient clean-up process and is meant for high throughput DNA, with higher DNA recovery rate (NGS clean up and size selection User manual NucleoMag ® NGS Clean-up and Size Select, 2019).
- Library Amplification (optional for some)
This is an optional step and mostly depends on the adapter and sample. PCR amplification of the library is done to enrich the ligated fragments required for clonal amplification by hybridizing to the adapter sequences and increasing the quantity of the library to ensure that adequate signal is generated. Post clean-up is performed to maintain the quality of the library.
BioChain has developed a cost-effective and innovative method to clean up the library without compromising on performance and quality and works equally well on all automated workstations and manually. Both size selection and clean-up can be performed and recovery rate of the fragments have been found to be very high.
For more details and orders, please follow the below link:
A summary of the library prep is depicted below:
Figure 2: Workflow of DNA library prep (“TUFTS – TUCF Genomics,” 2018)
- QC for Library
Methods utilizing techniques such as spectrophotometry, fluorometry, microfluidics electrophoresis, and qPCR are used for qualitative and quantitative purposes, along with measurements on fragment sizing and detecting impurities. Each has its own merits and drawbacks in terms of time, cost, reliability etc. and researchers use a combination of these techniques as suitable for their experiment such as Agilent Bioanalyzer and qPCR for Quality Control (QC) (“Pooling libraries for sequencing – OpenWetWare,” 2016)
Mostly, more than one sample is used, pooling of the samples with an equimolar concentration into a library is done with unique barcoding, which is platform-dependent. Finally, the fragments are inserted into the flow cell (containing sequences complementary to the DNA fragment) where the sequencing will occur.
Different platforms use varied chemistry to detect the signal that comes out after the nucleotides are read by the machine.
Figure 3: Chemistry of different sequencing platforms
The sequencing reads (stretch of nucleotides of a specified length) are what we get as an output. Post sequencing QC is checked through software provided with the sequencer for quality scores, duplicates, and other metrics. NGS sequencing can be of numerous types- whole genome, transcriptome, amplicon, exome, Chip, and many more. Downstream analysis includes converting these raw reads into interpretable data and then choosing to go for either de novo assembly or mapping the reads to a reference genome and identifying differential gene expression, mutations, or splice variants.
It is indeed a huge feat that we have gone from HGP to sequencing thousands of different organisms and reducing the cost to around $1000. The huge exabyte (1018) data output coming out from these sequencing machines has given rise to a platform for data sharing among the scientific community. This provides one with the opportunity to develop software and programs to analyse the data. On the other hand, the challenge of data storage is still being addressed.
Currently, the industry is flooded with different sequencing platforms and the choices may become overwhelming.
Thus, there needs to be informed decision-making at each and every step – beginning with how the sample is going be extracted, processed, sequenced- whether single or paired-end, how much to sequence, keeping in mind the cost and the aim of the experiment, which sequencing platform to choose for the experiment, data storage facility and computational ability to make sense of the data and turn it into a scientific reality. All these require an amalgamation of well-trained scientists, bioinformaticians, and clinicians to come together and implement their expertise.
- Head, S. R., Komori, H. K., LaMere, S. A., Whisenant, T., Van Nieuwerburgh, F., Salomon, D. R., & Ordoukhanian, P. (2014). Library construction for next-generation sequencing: Overviews and challenges. BioTechniques, 56(2). https://doi.org/10.2144/000114133
- Knierim, E., Lucke, B., Schwarz, J. M., Schuelke, M., & Seelow, D. (2011). Systematic Comparison of Three Methods for Fragmentation of Long-Range PCR Products for Next Generation Sequencing. PLoS ONE, 6(11), e28240. https://doi.org/10.1371/journal.pone.0028240
- Pittman, A., & Hardy, J. (2013). Genetic Analysis in Neurology. JAMA Neurology, 70(6), 696. https://doi.org/10.1001/jamaneurol.2013.2068
- Pooling libraries for sequencing – OpenWetWare. (2016). Retrieved December 5, 2019, from Openwetware.org website: https://openwetware.org/wiki/Pooling_libraries_for_sequencing
- Quail, M. A., Swerdlow, H., & Turner, D. J. (2009). Improved Protocols for the Illumina Genome Analyzer Sequencing System. Current Protocols in Human Genetics, 62(1). https://doi.org/10.1002/0471142905.hg1802s62
- Rius, M. (2015). Methods to Fragment DNA for Next Generation Sequencing. Retrieved November 19, 2019, from https://cpb-us-e1.wpmucdn.com/ website: https://cpbuse1.wpmucdn.com/you.stonybrook.edu/dist/1/681/files/2015/08/methods-to-fragment-DNA-for-Next-generation-sequencing-14w258i.pdf
- NGS clean up and size selection User manual NucleoMag ® NGS Clean-up and Size Select. (2019). Retrieved from https://www.mn-net.com/Portals/8/attachments/Redakteure_Bio/Protocols/DNA%20clean-up/UM_NGS_NMag.pdf
- TUFTS – TUCF Genomics. (2018). Retrieved December 5, 2019, from Tufts.edu website: http://tucf-genomics.tufts.edu/home/faq
Human Genome Project, HGP; NGS, Next Generation Sequencing; Quality Control, qPCR; quantitative PCR; QC; gDNA, Genomic DNA; cDNA, complementary DNA