Assessing Digital Phenotyping to Enhance Genetic Studies of Human Diseases

Every genotype (genetic constitution of an individual organism) results in one or more phenotypes (traits). Long ago, it was assumed by many that one DNA variant (genotype) would be responsible for one phenotype (e.g. a disease, clinical presentation, or quantitative trait such as height or weight, etc.); this can be true for certain monogenic (Mendelian) diseases. In fact, the Human Phenome Project was initiated in about 1996, before the genome was realized to be so complicated. However, it is now abundantly clear that — with virtually all multifactorial traits — one DNA variant is often associated with multiple phenotypes (this is called ‘pleiotropic’, also called ‘polyphenic’). The opposite of pleiotropism is “polygeny” or “polygenic” (e.g. two or more genes affecting one trait, such as the trait of eye color).

The attached article is studying digital-phenotyping and unstructured-phenotype data, showing how this can be combined with structured data such as hospital records to identify cases for genome-wide association studies (GWAS). These GEITP pages believes these attempts are extremely difficult and perhaps futile. [For example, recall yesterday’s GEITP blog on autism spectrum disorder (ASD); large-effect variants associated with ASD were also found to be associated with numerous other neurological disorders.] ☹

GWAS for binary phenotypes (e.g. ‘patient has disease X’, ‘patient does not have disease X’) typically obtain cases by way of recruitment through medical systems or archived medical samples; cases can then be compared to controls, or to random population controls (in which the disease has a certain prevalence in the population). Recent studies, however, have begun to rely on self-reported phenotypes — collected via questionnaires, or internet or mobile phone applications. Such ‘‘digital phenotyping’’ may be faster and cheaper than standard cohort study approaches, but the extent to which this approach agrees with more traditional phenotyping approaches for GWAS is largely unknown, because previous attempts to estimate the agreement between the two phenotyping approaches have focused on a small number of top associations and have not systematically assessed agreement across the hundreds or thousands of variants likely associated with most complex polygenic traits.

For instance, a GWAS of self-reported thrombosis events found strong agreement between the top associations displayed in Manhattan plots (a type of scatter plot — usually used to display data with a large number of data-points, many of non-zero amplitude, and with a distribution of higher-magnitude values; these are commonly used in GWAS to display significant DNA variants) from their self-reported thrombosis GWAS compared to previous cohort-based studies. Other studies have reported overlaps with genome-wide significant loci from cohort studies, the studies have not investigated the extent to which genetic effects that did not reach genome-wide significance agree with one another.

Authors [see attached article] used genetic parameters (including genetic correlation) to evaluate whether GWAS — performed using cases in the UK Biobank ascertained from hospital records, questionnaire responses, and family history of disease — implicate similar disease genetics across a range of effect-sizes. Authors found that hospital-record-and-questionnaire GWAS are largely able to identify similar genetic effects for many complex phenotypes, and that combining together both phenotyping methods improves statistical power to detect genetic associations. Authors also showed that family-history GWAS — using cases ascertained on family history of disease — agree with combined-hospital-record-and-questionnaire GWAS; authors also demonstrate that family-history GWAS have greater statistical power to detect genetic associations for some phenotypes. Overall, authors believe this study demonstrates that digital-phenotyping and unstructured-phenotype data — can be combined with structured data such as hospital records to identify cases for GWAS in biobanks, and this improves the ability of such studies to identify genetic associations.


Am J Hum Genet 7 May 2020; 106: 611-622

This entry was posted in Center for Environmental Genetics. Bookmark the permalink.