Accurate and Scalable Construction of Polygenic Risk Scores (PRS) in Large Biobank Data Sets; Predictive Utility of PRS for Coronary Heart Disease in 3 Racial and Ethnic Groups

As these GEITP pages have often discussed, there are relatively simple monogenic (Mendelian) traits — in which one or only a few genes contribute to the phenotype (trait) — and multifactorial traits (e.g. human complex diseases and quantitative traits such as height, serum cholesterol levels, body mass index) — in which hundreds or thousands of small-effect genes contribute to the trait. For the latter, polygenic risk scores (PRS) are being promoted as the best approach, because of the complex interactions of so many genes being involved. In simple terms, the PRS for a phenotype is a weighted summation of the estimated gene-effect sizes across genome-wide single-nucleotide variants (SNVs). By means of aggregating the contribution of many SNVs toward the phenotype of interest, PRS can be used to construct an individual’s inherited component (which is his/her genetic predisposition, underlying the phenotype of interest).

By estimating the genetic predisposition, PRS serves both as the earliest measurable (and the most stable) predictor for disease and disease-related complex traits. [“PGS” is commonly used for a polygenic score of a quantitative trait, whereas PRS is the preferred term for polygenic risk scores, when the phenotype of interest is a complex disease.] We will use only the “PRS term” in this discussion.

PRS have been widely used in a range of genetic applications, including: disease risk prediction, genetic prediction of complex traits, prioritization of preventive interventions, understanding missing heritability, modeling polygenic adaptation, genomic selection in animal and plant breeding programs, transcriptome-wide association studies, and Mendelian randomization analysis (i.e. a method of using measured variation in genes of known function — to examine the causal effect of a modifiable exposure on disease in observational studies). Accurate construction of PRS can facilitate disease prevention and intervention — at an early stage — and might be helpful in developing personalized medicine.

Authors [see first attachment] developed a method called Deterministic Bayesian Sparse Linear Mixed Model (DBSLMM); this method relies on flexible modeling assumptions about effect-size distribution to achieve robust and accurate prediction performance — across a range of genetic architectures (the underlying genetic basis of the phenotypic traits of interest, and their variational properties). Using simulations, authors show that DBSLMM achieves scalable and accurate prediction performance across a range of realistic genetic architectures. Analyzing 25 traits in UK Biobank and comparing this method with previously existing approaches, authors determined that DBSLMM achieves an average of 2.03% to 101% accuracy gain in internal cross-validations. In external validations on two separate datasets [including one from Bio-Bank Japan], DBSLMM achieved a 14.7% to 523% accuracy gain. In these real-life applications, DBSLMM was 1.03 to 28.1 times faster [and used only 7.4% to 25% of physical memory] — compared to other multiple regression-based PGS methods. Overall, authors believe that DBSLMM represents the most accurate and scalable method for constructing PRS in biobank scale datasets.

Going from simulations to a real-life disease phenotype — authors [see second attachment] investigated associations of ‘‘restricted’’ vs genome-wide PRS with coronary heart disease (CHD) in three major racial and ethnic groups in the U.S. — comprising 45,645 European-Americans (EA), 7,597 African-Americans (AA), and 2,493 Hispanic ethnicity (HE) individuals. Over a median follow-up of 11.1 years, 2,652 incident CHD events occurred. Hazard ratio and odds ratio for the association of restricted PRS with CHD were similar in EA and HE cohorts, but lower in AA cohorts. Genome-wide PRS were more strongly associated with CHD than restricted PRS were [see the article for more details]. These findings highlight the potential clinical utility of PRS for CHD, as well as the need to assemble diverse cohorts to generate ancestry-PRS and ethnicity-PRS. 😊


Am J Hum Genet May 2020; 106: 679-693 & 707-716

This entry was posted in Center for Environmental Genetics. Bookmark the permalink.