This concern has been voiced by many––virtually from the beginning of The Human Genome Project in Oct 1990, i.e. that concentration of large-scale genomic data generation on individuals of European ancestry can contribute to healthcare inequalities. Currently, in search for a genetic diagnosis, much of the effort in diagnostic sequencing paradigms is focused on candidate variants among known disease-associated genes that are either absent or sufficiently rare in available control reference cohorts, each of which is considered carefully as a possible explanation for the relevant presentation.
Need and Goldstein (2oo9) specifically argued that our ability to effectively filter variants to identify pathogenic ones––as sequencing becomes clinically routine––would be very different among different ancestry groups, … unless our knowledge of genetic variation is made more equal across ancestral groups. Unfortunately, now, with clinical sequencing becoming routine, this fear has become obvious. The common experience is that, when this clinical service is done today in patients of European ancestry, the number of candidate variants is significantly less than in other geographic-ancestral groups.
When searching for genetic aberrations responsible for Mendelian disorders, the expectation that disease genotypes will be under strong negative selection––instructs us to focus on genotypes at low or unobserved frequencies in the general population. As population reference cohorts increase in size, however, we are able to capture lower allelic frequencies with improved resolution. The recently released Exome Aggregation Consortium (ExAC) dataset, which contains aggregated exome sequence data from 60,252 individuals having an assigned geographic ancestry, aids in identifying allelic frequencies at ~6-fold lower resolution than what was available from combination of two pre-existing datasets––the Exome Sequencing Project (ESP) and the 1000 Genomes Project. About 60.9% of the samples in this ExAC reference cohort are of European ancestry, compared with 13.7% of South Asian ancestry, 9.6% of Latino ethnicity, 8.6% of African ancestry, and 7.2% of East Asian ancestry.
Authors [see attached article] illustrate how unequal representation of genetic variation can negatively affect present genomic interpretation in individuals of non-European ancestry. Whereas the findings are no big surprise, given our understanding of population genetics, there are still important lessons. First, these data show that it is instructive to assess the allele frequencies of non-European cases in their matched ancestry group(s). Second, increasing diversity of geographic ancestry and sample-size among sequenced reference cohorts greatly ameliorates the problem. Given that sample sizes are about to explode with the U.S. national initiative and other large-scale international sequencing studies, it is vital that we ensure the most equitable distribution of the generation of genomic data possible.
Genome Biol http://doi.org/bphp (2016)