High-throughput identification of human SNPs affecting regulatory element activity

Frequently these GEITP pages have chatted about gene expression, which is constantly being altered by environmental signals; these signals stimulate a cascade of downstream reactions which of course begin by tickling the regulatory elements that control gene expression, make the various genes go up or down in their expression. About 85 million single-nucleotide variants (SNVs) have been identified in human genomes — so far — and, of course,that number will continue to rise, as increasingly more genomes are sequenced. [With the first publication of the ‘complete human genome’ in April 2003 (which was anything BUT complete), I recall one colleague (who shall remain unidentified), asking me, “Is that it? The total mutations in all human genomes are 1400?”]

The vast majority of these SNVs are located in noncoding regions (i.e. DNA stretches where no mRNA or protein will result), and each typical human genome has ~500,000 SNVs, with non-reference alleles overlapping regulatory elements such as enhancers [DNA sequences that increase the level of transcription of a gene that is (almost always) located nearby, on the same chromosome] and promoters [regions of DNA (usually ~100–1000 base-pairs in length) that lead to initiation of transcription of a particular gene; promoters are located near the transcription start-sites of genes, ‘upstream’ on the DNA strand]. It has become increasingly clear that these noncoding SNVs can have a substantial impact on gene regulation, thereby contributing to phenotypic diversity (variation in a trait) and being associated with a wide range of human disorders.

Genome-wide association studies (GWAS) and expression quantitative trait locus (eQTL) mapping can identify candidate SNVs that may drive a particular trait or disorder (e.g. cancer, obesity, Alzheimer disease), or the expression level of individual genes (e.g. height, bone mass index), respectively. Unfortunately, even the largest GWAS and eQTL studies rarely achieve single-SNV resolution — due largely to linkage disequilibrium (LD; i.e. the non-random association of alleles at different loci in a given population). In practice, tens to hundreds of linked SNVs are correlated with a single trait. Although new fine-mapping techniques, integration with epigenomic data, deep-learning computational techniques, and GWAS of extremely large populations can help to achieve higher resolution — pinpointing of the causal SNVs remains a major challenge.

Having a list of all SNVs in the human genome, which have the potential to alter gene regulation, would lessen this problem. Ideally, the regulatory impact of SNVs would be measured directly. Two high-throughput methods have been employed for this purpose. [1] First, changes in chromatin features — such as DNase sensitivity and various histone modifications — have been mapped in lymphoblasts or primary blood cells derived from sets of human individuals with fully sequenced genomes. Here, the chromatin marks serve as proxies to infer effects on regulatory elements, with the caveat that a change in regulatory activity may not always be detected as a change in chromatin state, or vice versa. Furthermore, many traits do not manifest themselves in blood cells. And other cell-types are more difficult to obtain for epigenome mapping. ☹

[2] An alternative functional readout is to insert DNA sequence elements carrying each allele into a reporter plasmid. Upon transfection of these plasmids into cells, the promoter or enhancer activity of these elements can be measured quantitatively. Different cell-types may be used as models for corresponding tissues in vivo. Large-scale versions of this approach are referred to as massively parallel reporter assays (MPRAs), and they have been used to screen tens of thousands of SNVs. Each of these studies has yielded tens to, at most, several hundreds of SNVs that significantly alter promoter or enhancer activity. Because these MPRA studies cover only a tiny fraction of the genome, it is likely that many more SNVs with regulatory impact remain to be discovered.

Authors [see attached article] surveyed the regulatory effects of 5.9 million SNVs in two different cell-types — providing a resource that helps to identify causal SNVs among candidates generated by GWAS and eQTL studies. Authors leveraged the throughput and resolution of the survey of regulatory elements (SuRE) reporter technology to survey the effects of these 5.9 million SNVs (including 57% of the known common SNVs) on enhancer and promoter activity. They identified more than 30,000 SNVs that alter the activity of putative regulatory elements, usually in a cell-type-specific manner. Integration of this dataset with GWAS results — should help pinpoint SNVs that underlie human traits.


Nat Genet July 2019; 51: 1160-1169

This entry was posted in Center for Environmental Genetics. Bookmark the permalink.