Genome-wide association (GWA) studies have been successful in detecting genotype variants (DNA base alterations) correlated with phenotypes (traits) of clinical interest that represent gradients (e.g. complex diseases such as schizophrenia or type-2 diabetes, as well as body mass index, height, blood pressure, serum triglyceride levels, etc.). However, the power to detect these variants depends on the number of individuals whose phenotypes are collected, and––for phenotypes that are difficult to collect––the sample size might be insufficient to achieve the desired statistical power. The phenotype of interest is often difficult to collect, whereas surrogate phenotypes, or related phenotypes, are easier to collect and have already been collected in very large samples.
The attached publication demonstrates how one can take advantage of these additional related phenotypes to impute the phenotype of interest, or target phenotype, and then perform association analysis. Authors’ approach is to leverage the correlation structure between phenotypes to perform the imputation. The correlation structure can be estimated from a smaller complete dataset for which both the target and related phenotypes have been collected. Under some assumptions, the statistical power can be computed analytically––given the correlation structure of the phenotypes used in imputation.
In addition, the authors’ method can impute the summary statistic of the target phenotype––as a weighted linear combination of the summary statistics of related phenotypes. Thus, this method is applicable to datasets for which one might have access only to summary statistics and not to the raw genotypes. The authors illustrate their approach by
analyzing associated loci to triglycerides (TGs), body mass index (BMI), and systolic blood pressure (SBP) in the Northern Finland Birth Cohort dataset.
Am J Hum Genet July 2o16; 99: 89–103