Distinguishing “genetic correlation” from “causation” — across 52 diseases and complex traits

Okay, this topic is a bit intense (dense?), so stay with me on this one. The topic has to do with pleiotropy — when a gene, or a variant of one gene, causes two or more phenotypes (traits). This phenomenon is often seen in complex diseases. This phenomenon is also seen in response to an environmental toxicant (e.g. cigarette smoke) or response to a drug (think of all the side effects produced by e.g. gabapentin used to treat nerve pain). How can the effects of a single-nucleotide variant (SNV) be “divvied up” among many outcomes (traits)?

Mendelian randomization (MR) is widely used to identify potential causal relationships among heritable traits. MR allows you to test for — or, in certain cases, to estimate — a causal effect from observational data, when the study includes a number of confounding factors (as it almost always does). MR uses common genetic polymorphisms (inter-individual functionally silent differences in DNA sequence that make each human’s genome unique) with well-understood effects on exposure patterns (e.g. propensity to smoke cigarettes, or to drink alcohol) on effects that mimic those produced by modifiable exposures (e.g. filtered vs unfiltered cigarettes). Most important, the genotype (DNA sequence) must only affect the disease status indirectly via its effect on the exposure being studied.

During meiosis (i.e. formation of ova and sperm), genotypes are assigned randomly — when passed from parents to offspring. If it is assumed that choice of mate is not associated with genotype, then the population genotype distribution should be unrelated to the confounding factors that typically plague observational epidemiology studies. In this regard, MR can be thought of as a “natural” randomized controlled trial. Because “the polymorphism” is “the instrument”, MR is dependent on genome-wide association studies (GWAS) having provided good candidate genes that might explain the response to risk exposure.

From a statistical perspective, MR is an application of the technique of instrumental variables — with “genotype” acting as an instrument for the exposure being studied. MR is based on several assumptions: [a] that there is no direct relationship between the instrument and the dependent variable; and [b] that there are no direct paths between the instrument and any potential confounding factors. In addition to direct effects of “the instrument” on the trait (which can mislead the epidemiologist), confusing conclusions may also arise in the presence of: linkage disequilibrium (non-random association of alleles at different loci in a given population) with unmeasured directly-causal variants; genetic heterogeneity; pleiotropy (one gene responsible for two or more seemingly unreleated traits) often detected as a genetic correlation); or population stratification (differences in allele frequencies between a study group, and a control group, due to systematic differences in ancestry rather than association of genes with disease).

SNVs that are significantly associated with one trait (exposure; e.g. amount of cigarette-pack-years in smokers), can be used as genetic instruments to test for a causal effect on a second trait (outcome; e.g. smoking-induced lung cancer). If the exposure is causal, then SNVs affecting “the exposure” should affect “the outcome” proportionately. For example, low-density lipoprotein (LDL) and triglycerides — but not high-density lipoprotein (HDL) — causally affect coronary artery disease risk. However, pleiotropy presents a challenge for MR, especially when the exposure is highly polygenic (caused by hundreds or thousands of genes).

Authors [see attached] introduce a Latent Causal Variable (LCV) Model, under which the genetic correlation between two traits is mediated by a latent variable having a causal effect on each trait. Trait 1 is defined as a phenotype that is partially genetically causal for Trait 2, when it is strongly correlated with the causal variable (Trait 1 = high LDL levels; Trait 2 = coronary artery disease risk). Authors quantify partial causality using the genetic causality proportion (GCP) (i.e. correlation between the genetic influences on a trait and the genetic influences on a different trait, estimating the degree of pleiotropy or causal overlap). In simulations [see attached], authors show that LCV has major advantages over MR.

Authors show that LCV is able to avoid false positives (i.e. when one concludes incorrectly that a particular condition or attribute is present, but it really is not) due to genetic correlations — unlike MR. Across 52 traits (N = 331,000 = average population per trait), authors identified 30 causal relationships that had high GCP estimates. Novel findings in this study include a causal effect of LDL on bone mineral density, consistent with clinical trials of statins in which osteoporosis is a side-effect.

Nat Genet Dec 2o18; 50: 1728–1734

COMMENT: This last sentence “Novel findings in this study include a causal effect of LDL on bone mineral density, consistent with clinical trials of statins in which osteoporosis is a side-effect” — seems to imply that osteoporosis is a side-effect of statins use. Actually, what has been observed is that statins improve bone mineral density; therefore, if anything, statins might be useful in treating osteoporosis (more studies are needed here, because apparently any strong association with prevention of fracture risk has not yet been demonstrated). Still, osteoporosis is not a side-effect of statins use, but rather the opposite: osteogenesis (build-up of bone) appears to be a (beneficial) effect of statins.

This entry was posted in Center for Environmental Genetics. Bookmark the permalink.