As described often in these GEITP pages (archived articles can be found at https://genewhisperer.com/, thanks to the generous time and energy of Marian Miller), the human haploid (i.e. DNA on one chromosome) genome comprises ~3.2 billion nucleotides; the diploid genome (DNA on both chromosome) is twice that. While actual genes (protein-coding regions) represent only <2.0% of the total genome, there is relatively little known about the functional consequence of variation in the remaining 98% of the genome (noncoding genome), and this is the subject of the attached publication. There is a remarkable degree of evolutionary conservation of many of these noncoding regions –– among human, mouse, bird and fish (i.e. although the chromosomes are all jumbled –– in many cases, adjacent genes in fish amazingly remain adjacent to one another in mammals!). Because of whole-genome sequencing (WGS) studies of so many species now, noncoding regions have been annotated through the ENCODE Project, which relies on identification of biochemically active elements in the human genome –– with attention paid to regulatory elements that control gene activity. Regulatory control is also influenced by higher-order chromatin structure along each chromosome. In support of a role for noncoding variants in human disease and phenotypic traits –– most of the >16,000 common variants identified through genome-wide association studies (GWAS) exist in noncoding regions of the genome (http://www.ebi.ac.uk/gwas).
GWAS variants are increasingly being recognized as acting through changes in the regulatory circuitry. A combination of species-conservation and human-variation data has now been used to identify regions of proximal (untranslated regions; UTRs) noncoding sequence to infer gene-dosage sensitivity. However, despite recent progress in the study of noncoding variants, there remains a substantial challenge to characterize the noncoding variants in the human genome (which grow by >8,000 with each additional genome sequenced!!).
To characterize the population variation in noncoding regions further, authors [see attached] performed a comprehensive analysis of 11,257 whole-genome sequences and 16,384 heptamers (7-nuceotide motifs) to build a map of sequence constraint for the human species. They applied an approach that exploits the contribution of thousands of elements (DNA modules) in thousands of genomes. Meta-profiles can integrate sequence variation and frequency across genomic landmarks –– having the same sequence, structure or function. Authors discovered strong patterns of coordination within ~2 million base-pairs, i.e. nucleotides (~2 Mb) of a gene where the most constrained regulatory elements interact with the most essential genes. Constrained regions (i.e. A DNA segment in which free rotation of its ends is impossible) of the noncoding genome are as much as 52-fold enriched for known disease-related variants, as compared to unconstrained regions (21-fold, when compared to the genome average). This EXCITING map of sequence constraint across thousands of individuals should become an asset to help interpret noncoding elements in the human genome, prioritize single-nucleotide variants (SNVs), and reconsider gene units at a larger scale..!!
Nature Genet Mar 2o18; 50: 333–337