Thank you, Magnus, for sharing your recent review. Thank you, Ge, for providing us with your comments on this recently-published study.
For those not as “learned” in “missing heritability,” let me add a few thoughts. Genes responsible for phenotypic variation among individuals in any population can be classified as: [1] monogenic (Mendelian) traits, typically influenced by one or a few rare coding variants; [2] predominantly oligogenic traits that usually represent variability largely elicited by a small number of major genes; and [3] complex traits — produced mostly by innumerable small-effect variants (e.g., height, blood pressure, autism spectrum disorder).
Variance explained by single-nucleotide variants (SNVs) will be some subset of the total 100% of phenotypic variation. If 12,111 SNVs reflect 40% of variance explained, let’s pick an imaginary scenario. If 11 major SNVs were to each reflect 1% of variance explained, then the remaining 12,100 SNVs, combined, would reflect the remaining 29% of variance explained. I hope this makes the topic more clear 😊, rather than more muddled. ☹
DwN
From: Ge Zhang
Sent: Wednesday, November 9, 2022 3:35 PM
Dear Dr. Nebert,
I’ve been too busy to read the full article. I glanced at the abstract and the main results, and I can say that the sample size was astonishingly large (5.4 million) and the phenotypic variance explained (in Europeans, 40~45%) almost reached (saturated) the estimated SNV-based heritability. These are amazing achievements; however, they are also the results one would expect.
I think that this type of study (detecting small-effect variants, and improving variance explained by “dramatically” increasing sample size) — cannot be replicated with any other phenotypes — given the fact that height might be the “simplest” complex trait to study. This is because it is easy to measure, stable, “robust” (i.e., not sensitive to environmental factors in well-nourished populations), and also exhibits high heritability.
In studies of other complex traits, the benefit of sheer increases in sample size may not be that obvious — because the amount of heterogeneity added might offset the benefits of a large sample size. Below is a slide that I used in a recent lecture on the genetic architecture of complex traits. In studies of several other complex traits in which the genome-wide association studies (GWAS) of cohorts have sizes of more than one million — the variance explained reached only 5~16% of total phenotypic variation.
Best, Ge
Professor of Human Genetics, Cincinnati Children’s Hospital Research Center, Ohio
From: Magnus Ingelman-Sundberg
Sent: Sunday, November 6, 2022 2:14 PM
Hi Dan
Many thanks for these articles. I find this topic very exciting.
In fact, I just recently wrote a review for Trends in Pharmacol Sci (TiPS) that includes a lot of information about the missing heritability in “absorption-disposition-metabolism-excretion” (ADME) research. For anyone interested, the preprint is (attached, extreme right).
—Best, Magnus
From: Nebert, Daniel (nebertdw)
Sent: Friday, November 4, 2022 6:02 PM
This topic is central to gene-environment interactions. What is “Missing Heritability?” Ge Zhang and I have written about this topic several times — most recently in an invited review published in 2017 [see below].
A genome-wide association study (GWAS) is a research approach used to identify genomic variants that are statistically associated with a risk for a disease or a particular trait (e.g., type-2 diabetes, autism spectrum disorder, facial features, height, or response to a drug). The method involves screening entire genomes of many people, looking for genomic variants that occur more frequently in those with a specific disease or trait — compared to those without the disease or trait. Once such genomic variants are identified, they are typically used to search for nearby variants that contribute directly to the disease or trait (by which better therapy, or improved risk assessment, might be achieved).
Personalized medicine: Genetic risk prediction of drug response
Ge Zhang a, Daniel W. Nebert a,b,
a Division of Human Genetics, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229-3039, United States
b Department of Environmental Health and Center for Environmental Genetics, University of Cincinnati School of Medicine, Cincinnati, OH 45267-0056, United States
Pharmacology & Therapeutics 175 (2017) 75–90
Genome-wide association studies (GWAS)
The earliest GWAS was probably published in 2002, in which an association between the LTA gene and myocardial infarction was determined by typing almost 93,000 gene-based SNP markers (Ozaki, et al., 2002). Another early GWAS –– typing more than 116,000 SNPs –– reported an association between the CFH gene and age-related macular degeneration (Klein, et al., 2005). More recently, easy-to-use DNA chips containing 1 million to 5 million SNPs have become readily available. The GWAS field has expanded exponentially; today, >24,000 SNP-trait associations have been reported in >2,500 studies [https://www.ebi.ac.uk/gwas/]. These robust GWAS –– having P-values ranging from <10−8 to <10−400 –– underscore the value of using more stringent statistical significance levels when one is studying >1 million SNPs in large cohorts containing thousands, or even hundreds of thousands, of samples.
GWAS quickly became far more reliable for genotype-phenotype association tests –– when compared with studies involving one or several SNPs in small cohorts of several dozen or even several hundred individuals. These latter publications of (type-I and type-II error) artifacts have variously been called “the incidentalome” (Kohane, Masys, & Altman, 2006) and “the P <0.05 false-positive studies” (Nebert, Zhang, & Vesell, 2008). It should be recognized that multiple parameters (effect-size, allelic frequency, significance level, sample size) will all affect statistical power for any genotype-phenotype association study. Statistical power obviously improves with larger numbers of cases and controls. As any minor allele frequency (MAF) increases, fewer subjects are usually needed in the study group, and the level of detectable contribution by a genetic variant to a phenotype will be lower. If the MAF is low, greater numbers per group will be required, and the level of detectable contribution by a variant to a phenotype will be higher. However, one cannot always embrace the above statements. For example, when the contribution (effect-size) of the risk variant is measured by heritability, or variance explained, influence of the MAF on statistical power of an association study is not significant. Nevertheless, if effect-size is measured by odds ratio (OR) or genotypic relative risk (GRR), the influence of a MAF will be greater –– especially for MAFs <0.05. In addition to the MAF and prevalence of a disorder, the inheritance model (“additive,” “dominant,” or “recessive”) of a risk allele can also influence power of an association study. Virtually never do GWAS have sufficient statistical power to detect epistasis (gene × gene interactions; G×G) (Bhattacharjee, et al., 2010; Sackton & Hartl, 2016) or gene-environment (G×E) interactions (D. Thomas, 2010). Moreover, an additional underappreciated class of genetic interactions is intergenic compound heterozygosity, i.e. interactions between multiple rare variants contributing to a trait (Gibson, 2011). Such interactions encompass heterozygous combinations of multiple alleles. In a broad sense, if hundreds of mutations –– each having a frequency of <0.1% –– all contribute to the phenotype, then such an event could contribute substantially to individuals that are heterozygous at many of these loci. “Missing heritability”: real or imagined? Twenty years after launching the Human Genome Project in October 1990, initial GWAS findings became frustrating to those who wanted clear-cut explanations into the etiology of a complex disease (D. B. Goldstein, 2009). For most complex diseases, even multiple GWAS variants considered together (e.g., using polygenic risk score) typically explain too little variability in disease occurrence to be of much predictive value (Manolio, 2013; Wray, et al., 2013). However, more importantly, some GWAS data had identified potential novel therapeutic targets for treating a complex disease –– without knowing its precise etiology. Similarly, some GWAS data might uncover potential therapeutic targets for treating an environmental disease, by learning something about its mode-of-action without necessarily understanding its precise mechanism-of-action. Many GWAS often explain only a small proportion of heritability (defined as additive genetic variance). The absent proportion became known as “missing heritability” (Lander, 2011; Manolio, et al., 2009), leading to renewed awareness in the genetic architecture of human complex disease and traits (Gibson, 2011; Ge Zhang, 2015) –– a topic that had been extensively debated in the early 20th century. First, it was realized that heritability attributable to some common variants could be substantial. However, as GWAS cohort sizes continued to increase –– thereby identifying additional variants contributing smaller effects to the trait –– the “revealed heritability” continued to grow, albeit rarely reaching more than 20–25% for various diseases and traits (Lander, 2011). In addition, current GWAS may overlook many variants of lower frequency (MAFs 1–5%), because existing SNP-typing arrays often lack a more useful marker. Many complex disease-related alleles are probably included in this frequency class. Also, new genotyping arrays and imputation methods, based on the 1000 Genomes Project (Genomes Project, et al., 2012; Genomes Project, et al., 2015) or The Haplotype Reference Consortium (S. McCarthy, et al., 2016), are able to capture these less frequent variants. This topic is discussed in detail later. GWAS also miss many common small-effect variants –– due to limited sample-size and/or stringent statistical thresholds imposed to ensure reproducibility. Efforts seeking to infer contributions of loci that fall just short of statistical significance were then addressed (Park, et al., 2010; J. Yang, et al., 2010). Contributions of loci that fall even further short of statistical significance, likewise, will result in even smaller effect-sizes on phenotype. Although their individual contributions may be too small ever to detect by investigators designing feasible sample cohort sizes, these very-small-effect variants collectively will probably also explain a significant fraction of heritability (Gibson, 2010; Lander, 2011). Furthermore, rare variants of large-effect will sometimes contribute substantially to common diseases, although their roles are just recently being explored. Increases in our understanding of rare variants have now advanced –– via whole-exome sequencing (WES) (Bertier, Hetu, & Joly, 2016) and whole-genome sequencing (WGS); also called “next-generation” sequencing (NGS) (Haimovich, Muir, & Isaacs, 2015; Pinto, Ariani, Bianciardi, Daga, & Renieri, 2016). Whether rare variants will lead to discovery of a substantial number of new genes –– is a question that perhaps can be answered by systematic WES or WGS. Given the background rate of rare variants, many thousands of samples will be required to achieve statistical significance. Correspondingly, how to quantify total heritability due to rare variants remains unclear. Although the inferred effect-sizes are larger, overall contribution to heritability may be small because of their low frequencies (Gibson, 2011; Ge Zhang, 2015). Lastly, some “missing heritability” might be purely an illusion (Lander, 2011) because heritability is estimated from epidemiological data by applying principles for inferring additive genetic effects. These approximations may be overestimated –– due to methods that are not effective at excluding nonlinear contributions of G×G interactions, G×E interactions, or epistasis –– which are likely to be important. This brings us — 5 years later — to this recently published article and editorial [see attached]. Since 2007, “height” has often been studied as a convenient polygenic multifactorial trait; increasing the cohort size has shown that the amount of heritability (“variance explained”) is greater, albeit not nearly close to 100%. Some have suggested that a GWAS that includes all adult humans on earth would still be insufficient to reach 100% of “variance explained”…!! Until now, the largest GWAS published for adult height had reported 3,290 independent associations in 712 loci, using a cohort size of 700,000 individuals (Yengo et al., 2018). Adult height (highly heritable and easily measured) has provided a larger number of common genetic associations than any other human phenotype. In addition, a large collection of genes has been implicated in disorders of skeletal growth, and these are enriched in loci mapped by GWAS of height in the normal range; these features thus make height an attractive model trait for assessing the role of common genetic variation in defining the genetic and biological architecture of polygenic human phenotypes. As available sample sizes continue to increase for GWAS of common variants — it becomes important to consider how much more heritability can be uncovered. Moreover, because most GWAS continue to be performed largely in populations of European ancestry, it is necessary to address these questions of “completeness” in the context of multiple ancestries. Finally, some have proposed that, when sample sizes become sufficiently large, effectively every gene and genomic region will be implicated by GWAS, rather than certain subsets of genes and biological pathways being specified. In the attached article just published, using data from a GWAS of 5.4 million individuals of diverse ancestries, authors show that 12,111 independent single-nucleotide variants (SNVs) that are significantly associated with height account for nearly all of the common SNV-based heritability; these SNvs are clustered within 7,209 non-overlapping genomic segments with a mean size of ~90 kilobases (kb), covering about 21% of the genome. The density of independent associations varies across the genome and the regions of increased density are enriched for biologically relevant genes. This combined analysis of 281 GWAS not only found 12,111 common DNA variants associated with a person’s height — but also shows that larger studies will not yield more variants in populations of European ancestry. Authors have therefore demonstrated that it is possible to achieve saturation for complex traits (however, cohort size might have to be in the millions to do so…!!). Ancestrally, ethnically, globally and socio-economically diverse samples are now necessary to reap the full benefits of GWAS. Although this map is saturated for populations of European ancestry, further research is needed to achieve equivalent saturation in other ancestries. 😊😊😊 DwN Nature 27 Oct 2022; 610: 704-712 & editorial pp 631-632 COMMENT: Thank you, Magnus, for sharing your recent review. Thank you, Ge, for providing us with your comments on this recently-published study. For those not as “learned” in “missing heritability,” let me add a few thoughts. Genes responsible for phenotypic variation among individuals in any population can be classified as: [1] monogenic (Mendelian) traits, typically influenced by one or a few rare coding variants; [2] predominantly oligogenic traits that usually represent variability largely elicited by a small number of major genes; and [3] complex traits — produced mostly by innumerable small-effect variants (e.g., height, blood pressure, autism spectrum disorder). Variance explained by single-nucleotide variants (SNVs) will be some subset of the total 100% of phenotypic variation. If 12,111 SNVs reflect 40% of variance explained, let’s pick an imaginary scenario. If 11 major SNVs were to each reflect 1% of variance explained, then the remaining 12,100 SNVs, combined, would reflect the remaining 29% of variance explained. I hope this makes the topic more clear 😊, rather than more muddled. ☹ DwN