cDNA vs Genomic DNA

Deoxyribonucleic acid, (DNA) is the molecule that carries the instructions for all aspects of an organism’s functions, from growth, to metabolism, to reproduction. In living organisms, most of the DNA resides in tightly coiled structures, called chromosomes, in the nucleus inside each cell. DNA is made up of four different building blocks, called nucleotides, which are each made up of one of four nitrogenous bases, guanine (G), adenine (A), thymine (T) and cytosine (C) coupled to a deoxyribose sugar, which is able to bind to other deoxyribose sugars via phosphate linkages to form long chains, some of which can be well over 100,000,000 molecules long. Since each deoxyribose in a DNA chain is coupled to one of the four nitrogenous bases (G, A, T, or C), these long chains can carry information – with groups of three nucleotides forming the smallest, but most well-defined “words” in the DNA language. These “words” are called codons.  Codons are used to call for specific amino acids to be bonded together to form proteins. For instance the codon adenosine-adenosine-guanosine (AAG) calls for the amino acid lysine (lys) to be incorporated into a protein molecule.  The codon AGG calls for the amino acid arginine (arg). So the AAG-AGG would call for one lys to be coupled to one arg in a growing protein chain. There are also codons that, under the right circumstances, call for a protein to begin to be formed (start codons), or for a protein chain to be finished (stop codons). As you can see from this simple example, DNA can carry an absolutely massive amount of information.

The DNA that resides in chromosomes inside the nucleus, that includes all the biological information that is transferred to the next generation, is called genomic DNA (gDNA). The words “genome” and “genomic” come from the word “gene”. A gene is a set of codons that specify a specific protein chain, along with the associated start and stop codons. So the word genome is just an extension of this concept and means the collection of all genes and other information contained in the DNA inside the nuclei of an organism’s cells.

Complementary DNA, or cDNA, is something either produced by some viruses or synthesized in laboratories. In nature, the way information from DNA makes its way out of the nucleus of a cell and made into proteins, is through an intermediary molecule called messenger RNA (mRNA). mRNA contains sets of four nucleotides, just like DNA with the uracil base instead of thymine, and is produced when enzymes inside the nucleus bind to specific genes and copy their information into RNA, which is very similar to DNA, but with a ribose sugar and not deoxyribose.  Ribose sugar chains are generally short lived, and degrade easily. They are perfect for conveying information from the chromosomes in the nucleus to the machinery that makes proteins, and then being destroyed. Initially, it was observed that gDNA was always read and transcribed into mRNA, which guided protein formation, and then got disposed of. The notion that information might always flow from DNA to RNA to protein was somewhat jokingly referred to as the Central Dogma of molecular biology. Calling it that challenged scientists to find exceptions to this rule.  Virologists eventually found one such exception. Retroviruses, such as HIV have mechanisms for “reverse transcribing RNA.” This means that they can take RNA chains and produce DNA chains from them. In this way, during reverse transcription, information flowed “backwards” from RNA back to DNA – this invalidated the Central Dogma. DNA that arises from this process is called complementary DNA (cDNA). Because it is made this way, when scientists use viral enzymes to make cDNA from RNA isolated from the cells and tissues that they are studying, it does not contain introns, which are nucleotide sequences that are “edited out” of RNA before it can be used to make proteins. cDNA also does not contain any other gDNA that does not directly code for a protein (referred to as non coding DNA), and there is a lot of DNA in chromosomes that does not get read to make proteins. Lastly, not all genes in the gDNA are being transcribed into mRNA at any given time. So cDNA will only contain those genes that are actively being used by a specific cell or tissue at a point in time. There is much less total information in cDNA than gDNA, but that information that remains can be a lot more relevant to what a researcher is looking at.

Once isolated, gDNA can be used to make genomic libraries, for DNA sequencing, fingerprinting, and other applications. In order to isolate cDNA, first the RNA of an organism must be isolated. Then, using a reverse transcriptase enzyme, the cDNA can be made. cDNA also be used to make cDNA libraries (which are permanent collections cDNA that can be copied and/or stored long term), and it is commonly used to clone eukaryotic genes in a prokaryote. This way a protein expressed in a eukaryotic organism can be introduced into a prokaryote. For this process cDNA is used over gDNA, since cDNA does not contain any introns, which prokaryotes cannot edit out. cDNA is essentially just gDNA without all the noncoding regions, which is how it gets its name as complimentary DNA.

Retroviruses can use their cDNA to produce mRNA in the host, leading to the production of the viral proteins. This is possible because retroviruses use RNA as their genomic material instead of DNA, and it is reverse transcribed into the cDNA, which then undergoes normal transcription and leads to the viral protein in the host.

Another interesting fact is that the US Supreme Court decided in 2013, that, because cDNA is not naturally occurring, it can be patented, while gDNA, being naturally occurring and not invented, cannot.

Medicinal Chemistry PhD candidate from The University of Toledo.