Mapping and characterization of structural variants (SVs) in 17,795 human genomes

As the previous GEITP pages email described, in the early days of genetic studies, the concept (learned in grade school 😊) was that “DNA is transcribed into RNA, which is translated into protein.” Then, it was established that “DNA —> primary transcript, which then results in messenger-RNA (mRNA) via splicing; then mRNA (from DNA coding region) —> protein. An early belief was the “one gene = one gene product (protein)” concept. Now, we know the primary transcript can exhibit dozens of start-sites and termination sites, leading to numerous structural variants (SVs) often resulting in different proteins (many exhibiting distinct functions).


Studies of human genetics use whole-genome sequencing (WGS) to enable comprehensive trait-mapping analyses across the full diversity of genome variation — including SVs [defined as 50 base pairs (bp) or greater — such as insertions/deletions (indels), duplications, inversions, and other rearrangements]. SVs appear to have a disproportionately large role (relative to their abundance) in the biology of rare diseases, and in shaping heritable differences in gene expression in human populations. Rare and de novo SVs have been implicated in the genetics of autism and of schizophrenia; few other complex trait association studies have directly assessed SVs. One challenge for interpretation of SVs in WGS-based studies is the lack of high-quality publicly-available variant maps from large populations.


Authors [see attached article] used a scalable pipeline to map and characterize structural variants in 17,795 deeply-sequenced human genomes. And they have publicly released these site-frequency data to create the largest WGS-based SV to date [see attached article]. On average, each individual genome contains ~2.9 rare SVs that alter coding regions; these SVs affect the dosage or structure of ~4.2 genes and account for 4.0–11.2% of rare high-impact coding alleles…!! Using a computational model, authors estimated that SVs account for 17.2% of rare alleles genome-wide — with predicted deleterious effects that are equivalent to loss-of-function (LoF) coding alleles; ~90% of such SVs are noncoding deletions (with a mean of 19.1 per genome).


Authors reported 158,991 ultra-rare structural variants (u-rSVs) and showed that ~2% of individuals carry megabase-scale u-rSVs, nearly half of which are balanced, or complex, rearrangements. Lastly, authors inferred the dosage sensitivity of genes and noncoding elements, and identified trends that relate to element class and conservation. This study will help in the future, to guide the analysis and interpretation of SVs in the era of WGS.  😊





Nature 2 Jul 2020; 583: 83-89

This entry was posted in Center for Environmental Genetics. Bookmark the permalink.