Integrating Clinical Data and Imputed Transcriptome from GWAS to Uncover Complex Disease Subtypes: Applications in Psychiatry and Cardiology

A more accurate classification of complex diseases — such as psychiatric illnesses and cardiometabolic disorders — into clinically and biologically more homogeneous subtypes could perhaps facilitate the understanding of disease pathophysiology and development of more targeted interventions (i.e. drug therapy). Traditionally, disease subtyping is based on clinical characteristics alone; however, disease subtypes identified by such an approach may not reflect underlying biological mechanisms. For example, the same disease symptom may be caused by different mechanisms in different subjects (there are two types of genetic heterogeneity: allelic heterogeneity occurs when a similar phenotype is produced by different alleles of the same gene; locus heterogeneity occurs when a similar phenotype is produced by mutations in different genes. The opposite is ‘pleiotropy’: when one gene or variant contributes to two or more seemingly unrelated phenotypic traits).

Patients with similar clinical presentations can also have varying responses to treatment. The last 15 years has witnessed the remarkable success of genome-wide association studies (GWAS), in which many susceptibility loci for complex diseases have been identified. GWAS data have been advocated to be used in translational contexts. For example, there has been increasing interest to apply GWAS data for risk prediction, drug discovery, and repurposing (i.e. discovering that a drug, used for one condition, is efficacious in treating a different condition). Currently, more than 3,800 GWAS have been published, and the number continues to expand. Despite the vast amount of GWAS data available, an important translational application has been largely ignored (i.e. can genomic information from GWAS help to improve patient stratification or disease subtyping?) Subtyping might be improved by combining both clinical and genomic information.

Authors [see attached article] propose an analytic framework capable of discovering complex-disease subgroups — by leveraging both GWAS-predicted gene expression levels and clinical data via a multi-view bicluster analysis. This approach connects single-nucleotide variants (SNVs) to genes through their effects on expression, so that the analysis is more biologically relevant and interpretable (compared to a pure SNV-based analysis). Transcriptomes from different tissues can also be readily modeled. Authors proposed various evaluation metrics for assessing clustering performance. Their framework was able to subtype schizophrenia patients into diverse subgroups that have different prognoses and treatment responses.

Authors [see attached article] applied their framework to the Northern Finland Birth Cohort (NFBC) 1966 dataset and, in a gender-stratified analysis, identified high and low cardiometabolic risk subgroups. The prediction strength by cross-validation was greater than 80%, suggesting good consistency of the clustering model. These results propose that a more data-driven and biologically-informed approach to defining metabolic-syndrome and psychiatric-subtyping disorders — can be achieved. Finally, authors found that genes selected without bias by this algorithm are significantly enriched for known susceptibility genes discovered in GWAS of schizophrenia or cardiovascular diseases. This proposed framework therefore opens up a new approach to subject stratification. 😊


Am J Hum Genet Dec 2019; 105: 1193-1212

This entry was posted in Center for Environmental Genetics. Bookmark the permalink.