A Fast and Accurate Method for Genome-wide Scale Phenome-wide G × E Analysis and Its Application to UK Biobank

The etiology of most complex diseases (i.e. multifactorial traits) involves genetic variants, environmental factors, and gene-environment (GxE) interaction effects. Because of the contribution of many small-effect genes and marginal genetic association studies, GxE analysis requires large numbers of samples and detailed (i.e. very accurate) measurements of environmental exposures; these caveats therefore limit many possible discoveries. Large-scale population-based biobanks — with detailed phenotypic and environmental information, such as UK-Biobank — can be ideal resources for identifying GxE effects. However, due to large computation costs and the presence of case-control imbalance (e.g. having, say, 10 cases and 10,000 controls), existing methods often fail.

Authors [see attached article] propose a scalable and accurate method, which they call “SPAGE” (SaddlePoint Approximation implementation of GxE analysis), which is applicable for genome-wide-scale, phenome-wide GxE studies. SPAGE fits a genotype-independent logistic model — only once — across the genome-wide analysis in order to reduce computation costs [phenome-wide association studies (PheWAS) are designed to search for associations between single-nucleotide variants (SNVs) and a large number of different traits that can be statistically estimated]. SPAGE uses a saddlepoint approximation (SPA) [sorry, but this ‘method of steepest descent’ is too complicated for these GEITP pages to try to explain ☹] to calibrate the test statistics for analysis of phenotypes with unbalanced case-control ratios.

It can be seen [see attached article] that authors used simulation studies to show that SPAGE is 33– to 79-times faster than the Wald test and 72– to 439-times faster than the Firth’s test. SPAGE can control for type I error rates (i.e. erroneously accepting false positives, which is not desirable) at the genome-wide significance level, even when case-control ratios are extremely unbalanced. Following analysis of UK-Biobank data of 344,341 white British European-ancestry samples — authors demonstrated that SPAGE can efficiently analyze large samples — while, at the same time, control for unbalanced case-control ratios. This article shows just how far we have come in analyzing GxE interactions. 😊


Am J Hum Genet Dec 2019; 105: 1182-1192

This entry was posted in Center for Environmental Genetics. Bookmark the permalink.