Power of inclusion: Enhancing polygenic prediction with admixed individuals (simplified)

I apologize. Yesterday’s GEITP blog was regarded by some “as a bit difficult to understand” (i.e., “more basic background” is needed, please). So, here goes:

For more than two decades, genome-wide association studies (GWASs) have unequivocally shown that common complex disorders have a polygenic genetic architecture — which has allowed researchers to identify genetic variants (changes in DNA sequence) that are associated with specific diseases. For some traits, dozens or hundreds of single-nucleotide variants (SNVs) have been found to be associated. The “winner” to date is the trait (phenotype) for HEIGHT in which 12,111 SNVs are involved…!!

Many traits can be dissected by GWAS studies, and the hope is that the discovery of unexpected genes might help explain etiology or improve treatment. An intriguing (gene-environment) example I received today is a genetic test to forewarn the physician (and Parkinsonian patient) that dopamine agonists (in a subgroup of patients) can cause an unwanted adverse reaction, ICD (i.e., who wants to treat a horrible disease like PD, by giving a drug that makes things worse??):

Impulse control disorders (ICDs) often appear in people with Parkinson disease (PD), specifically those treated with a class of drugs called dopamine agonists. Newly published research, funded in part by The Michael J. Fox Foundation (MJFF), suggests genetic data can help provide warnings to those at the highest risk. If doctors are able to assess ICD risk consistently, it would help them warn people about ICDs and personalize treatments to minimize that risk.

Currently, doctors often use dopamine agonists to treat Parkinson’s disease. These agonists stimulate activity when binding with dopamine receptors, which can help alleviate Parkinson’s symptoms like motor challenges. However, the rise in use of dopamine agonists has caused ICDs to appear more commonly.

Knowing a person’s risk for developing an impulse control disorder can help chart their treatment path. For example, a doctor might choose a dopamine replacement like levodopa if their patient is at high risk for an ICD, while they might choose a dopamine agonist (which mimics, rather than replacing) for someone with a lower risk.

The authors of a paper recently published in the Annals of Clinical and Translational Neurology, led by a team at the University of Pennsylvania, say they can now use genetic data (along with other risk factors) to determine a person with Parkinson’s risk of developing ICDs. Knowing that risk allows for more individualized approaches (“precision medicine”) to their treatment — such as substituting dopamine agonists with dopamine replacements.

Taking all variants (DNA nucleotide changes) in each individual patient’s whole genome — can further be combined into a polygenic risk score that captures part of an individual’s susceptibility to come down with a specific disease. PRSs have been widely applied in research studies, confirming the association between the scores and disease status, but their clinical utility has yet to be established. Polygenic risk scores may be used to estimate an individual’s lifetime genetic risk of disease, but the current discriminative ability is low in the general population.

Clinical implementation of PRSs may be useful in cohorts (the larger the N of genomes, the better) where there is a higher prior probability of disease (e.g., in early stages of diseases to assist in diagnosis or to inform treatment choices). Important considerations are the weaker evidence base in application to non-European ancestry and the challenges in translating an individual’s PRS from a percentile of a normal distribution to a lifetime disease risk. In the attached review, it was confusing that the authors used “polygenic scores” (PGSs) instead of “polygenic risk scores” (PRSs), But the authors emphasized that larger numbers of non-European samples, and authors demonstrated by simulation that larger numbers of “admixed” individuals (two or more ethnicities in the same person, which is becoming increasingly common these days) — will increase the power of statistical correlations (the larger the N of admixed genomes, the better).


From: Nebert, Daniel (nebertdw)
Sent: Wednesday, January 17, 2024 4:29 PM

Polygenic scores (PGSs) are used for combining genetic effects into the individual-level genetic liability of diseases or non-disease traits (e.g., risk of type-2 diabetes or schizophrenia; risk of lung cancer as a function of cigarettes smoked, or skin cancer as a function of arsenic exposure in drinking water). PGSs have attracted substantial research interest — as a result of the recent expansion of genotyped cohort sample sizes, increased appreciation of the polygenicity of complex traits, and recent methodological innovations and advances in PGS training. For some traits, the predictive performance has improved the potential clinical relevance of PGS.

However, most PGS models suffer from limited transferability across populations — despite the fact that some complex traits manifest substantial trans-ancestry genetic correlation (i.e., correlations of genes across ethnic groups). The limited transferability is partly due to the underrepresentation of non-European individuals in genetic studies and results in delaying the realization of equitable healthcare benefits from advancements in genetic research.

Several efforts are underway to improve the transferability of PGS models. First, active recruitment of non-European individuals in genetic studies, along with global partnerships and capacity building, are significantly increasing. However, most genome-wide association study (GWAS) cohorts have not yet comprehended the vast diversity that proportionally represents global populations. Second, the development of computational methods can complement these efforts and provide immediate benefits to individuals of diverse ancestry groups. Existing efforts include performing PGS modeling — by prioritizing variants present in diverse populations, and cell-type-specific regulatory elements — and combining multiple polygenic predictors characterized for multiple ancestry groups.

Admixed individuals (whose genomes consist of haplotypes from more than one ancestry group and account for one in seven newborns in the U.S.) are often excluded in PGS model training, given the technical limitations. Most modern PGS methods apply Bayesian multivariate regression by including GWAS summary statistics and ancestry-matched linkage disequilibrium (LD) reference panels. Although methods of applying GWAS analysis to admixed individuals exist, dependencies on the LD reference panels and computational complexities in representing LD for admixed individuals present challenges in the estimation of variant-effect sizes in PGS modeling.

However, including admixed individuals offers valuable insights into the genomic basis of common complex traits. A recent study indicates that the individual-level PGS performance shows linear decay as a function of genomic distance — defined as the Euclidean distance on the genotype principle-component analyses (PCA) projection from the PGS training set; this highlights the importance of considering the continuum of genomic ancestry in PGS evaluation. Given the substantial trans-ancestry genetic correlation in some complex traits, one might expect that admixed individuals can also offer unique opportunities to train PGS models with improved transferability.

Authors [see attached pdf] presented inclusive PGS (iPGS) — which captures ancestry-shared genetic effects by finding the exact solution for penalized regression on individual-level data. This approach is naturally applicable to admixed individuals. Authors validated their approach in a simulation study across 33 configurations with varying heritability, polygenicity, and ancestry composition in the training set. When iPGS is applied to N = 237,055 ancestry-diverse individuals in the UK Biobank, it shows the greatest improvements in Africans (by 48.9%) on average across 60 quantitative traits and up to 50-fold improvements for some traits (e.g., “neutrophil count”, R2 = 0.058) over the baseline model trained on the same number of European individuals.

When authors allowed iPGS to use N = 284,661 individuals, they observed an average improvement of 60.8% for African, 11.6% for South Asian, 7.3% for non-British White, 4.8% for White British, and 17.8% for “other individuals”. Authors further developed iPGS + refit — to jointly model the ancestry-shared and ancestry-dependent genetic effects when heterogeneous genetic associations were present. For “neutrophil count”, for example, iPGS + refit showed the highest predictive performance in the African group (R2 = 0.115), which exceeds the best predictive performance for the White British group(!!) (R2 = 0.090 in the iPGS model) — even though only 1.49% of individuals used in the iPGS training are of African ancestry. Authors declared that their data shows the power of including diverse individuals for developing more equitable PGS models. 😊


Am J Hum Genet 2 Nov 2023; 110, 1888–1902

This entry was posted in Center for Environmental Genetics. Bookmark the permalink.