Along any chromosomal segment, the DNA is divided into genes and intergenic regions. Intergenic regions include promoters and enhancers near a gene that usually regulate the expression of that gene (but sometimes distant genes) –– as well as many other structures beyond the scope of this topic. Genes include exons, which in part contribute to the messenger RNA (mRNA) and in turn gets translated into the protein (the usual gene product). Genes also include introns (between each exon), which are transcribed into RNA but get spliced out during formation of the mRNA; introns can also contain regulatory elements. In whole-genome sequencing (WGS), the entire DNA sequence is determined, whereas in whole-exome sequencing (WES), only the DNA sequence of the exons is determined.
WES represents 1.5% to 2.0% of the genome, while WGS purporteedly covers the entire haploid genome (DNA contained in linear fashion along one of the two chromosomes) but, in reality, WGS today covers 95-98% of the entire genome. “Coding variants” are single-nucleotide variants (SNVs; SNPs) located in exons that alter (synonymous) or do not alter (nonsynonymous) the protein sequence.
At least 325 million DNA single-nucleotide variants (SNVs; SNPs) have (so far) been identified in the human genome, 15 million of which are present at frequencies of 1% or higher across different populations, worldwide. “SNP-chips” only detect individual nucleotide markers (there are ~3.2 billion nucleotides in a human haploid genome) –– meaning that 1 million or 5 million “SNPs” on a “chip” represent a far more preliminary “scan/screen” than WGS-determined DNA sequence of the entire haploid genome. In any DNA analysis, it is therefore obvious that WGS is more complete than WES, and SNP-chip analysis provides the amount of least information.
SNVs (or SNPs) can elicit a large-effect, intermediate- or modest-effect, or small-effect on any trait (phenotype). (Or, silent SNVs produce no apparent effect at all.) This article [attached] describes rare coding variants (detected by WES) that evoke a large-effect on the trait [in this case, esophageal squamous cell carcinoma (ESCC)]. Authors carried out WES on 3,714 individuals with ESCC, and 3,880 controls, searching for low-frequency susceptibility loci. Two independent replication samples comprisdc 7,002 cases and 8,757 controls. Authors found six new susceptibility loci in the CCHCR1, TCN2, TNXB, LTA, CYP26B1 and FASN genes.
CYP26B1 is an especially intriguing (credible) example –– because it encodes a retinoic acid-metabolizing enzyme; individuals with the SNV show significantly lower levels of serum all-trans retinoic acid (which is an important anti-cancer nutrient) than those having the consensus nucleotide and appear to be more protected against ESCC. Higher risk of ESCC therefore is thus likely due to an enhanced capacity of this variant CYP26B1 to break down all-trans retinoic acid. This exciting study emphasizes the importance of rare coding variants that appear to be associated with a very complex multifactorial trait such as esophageal squamous cell carcinoma. Note (in this study) that the environment (heavy use of alcohol and smoking) also contributes to this particular cancer phenotype.
Nature Genet Mar 2o18; 50: 338–343