Power of inclusion: Enhancing polygenic prediction with admixed individuals

Polygenic scores (PGSs) are used for combining genetic effects into the individual-level genetic liability of diseases or non-disease traits (e.g., risk of type-2 diabetes or schizophrenia; risk of lung cancer as a function of cigarettes smoked, or skin cancer as a function of arsenic exposure in drinking water). PGSs have attracted substantial research interest — as a result of the recent expansion of genotyped cohort sample sizes, increased appreciation of the polygenicity of complex traits, and recent methodological innovations and advances in PGS training. For some traits, the predictive performance has improved the potential clinical relevance of PGS.

However, most PGS models suffer from limited transferability across populations — despite the fact that some complex traits manifest substantial trans-ancestry genetic correlation (i.e., correlations of genes across ethnic groups). The limited transferability is partly due to the underrepresentation of non-European individuals in genetic studies and results in delaying the realization of equitable healthcare benefits from advancements in genetic research.

Several efforts are underway to improve the transferability of PGS models. First, active recruitment of non-European individuals in genetic studies, along with global partnerships and capacity building, are significantly increasing. However, most genome-wide association study (GWAS) cohorts have not yet comprehended the vast diversity that proportionally represents global populations. Second, the development of computational methods can complement these efforts and provide immediate benefits to individuals of diverse ancestry groups. Existing efforts include performing PGS modeling — by prioritizing variants present in diverse populations, and cell-type-specific regulatory elements — and combining multiple polygenic predictors characterized for multiple ancestry groups.

Admixed individuals (whose genomes consist of haplotypes from more than one ancestry group and account for one in seven newborns in the U.S.) are often excluded in PGS model training, given the technical limitations. Most modern PGS methods apply Bayesian multivariate regression by including GWAS summary statistics and ancestry-matched linkage disequilibrium (LD) reference panels. Although methods of applying GWAS analysis to admixed individuals exist, dependencies on the LD reference panels and computational complexities in representing LD for admixed individuals present challenges in the estimation of variant-effect sizes in PGS modeling.

However, including admixed individuals offers valuable insights into the genomic basis of common complex traits. A recent study indicates that the individual-level PGS performance shows linear decay as a function of genomic distance — defined as the Euclidean distance on the genotype principle-component analyses (PCA) projection from the PGS training set; this highlights the importance of considering the continuum of genomic ancestry in PGS evaluation. Given the substantial trans-ancestry genetic correlation in some complex traits, one might expect that admixed individuals can also offer unique opportunities to train PGS models with improved transferability.

Authors [see attached pdf] presented inclusive PGS (iPGS) — which captures ancestry-shared genetic effects by finding the exact solution for penalized regression on individual-level data. This approach is naturally applicable to admixed individuals. Authors validated their approach in a simulation study across 33 configurations with varying heritability, polygenicity, and ancestry composition in the training set. When iPGS is applied to N = 237,055 ancestry-diverse individuals in the UK Biobank, it shows the greatest improvements in Africans (by 48.9%) on average across 60 quantitative traits and up to 50-fold improvements for some traits (e.g., “neutrophil count”, R2 = 0.058) over the baseline model trained on the same number of European individuals.

When authors allowed iPGS to use N = 284,661 individuals, they observed an average improvement of 60.8% for African, 11.6% for South Asian, 7.3% for non-British White, 4.8% for White British, and 17.8% for “other individuals”. Authors further developed iPGS + refit — to jointly model the ancestry-shared and ancestry-dependent genetic effects when heterogeneous genetic associations were present. For “neutrophil count”, for example, iPGS + refit showed the highest predictive performance in the African group (R2 = 0.115), which exceeds the best predictive performance for the White British group(!!) (R2 = 0.090 in the iPGS model) — even though only 1.49% of individuals used in the iPGS training are of African ancestry. Authors declared that their data shows the power of including diverse individuals for developing more equitable PGS models. 😊


Am J Hum Genet 2 Nov 2023; 110, 1888–1902

This entry was posted in Center for Environmental Genetics. Bookmark the permalink.