Exomes are those portions of our DNA that gets transcribed into messenger-RNA (mRNA) and then translated into proteins (gene products). Large-scale reference data sets of human genetic variation will be critical for the medical and functional interpretation of DNA sequence alterations. The amazing report [attached] describes the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries––generated as part of the Exome Aggregation Consortium (ExAC). This catalog of human genetic diversity contains an average of one variant every eight bases(!) of the exome, and provides direct evidence for the presence of widespread mutational recurrence.
Authors have used this catalog to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation. They identified 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. They also demonstrate that these data can be used for the efficient filtering of candidate-disease-causing variants, and for the discovery of human “knockout” variants in protein-coding genes.
This study and accompanying database (Suppl. Data online) are noteworthy for several reasons: [1] for the sheer number of individuals sequenced, and [2] the depth of coverage––i.e. the number of times each nucleotide in each individual’s exome was sequenced. In the recently completed 1000Genomes Project, 2,504 genomes were shallowly sequenced, a cost-saving strategy that favored the discovery of common over rare genetic variation. In contrast, each exome in ExAC has been sequenced deeply. Consequently, even genetic variants observed in just one individual can be confidently considered to be real.
More than half the ~7.5 million variants found by ExAC are seen only once; however, collectively, they occur at a remarkably high density (at one out of every eight sites in the exome). For each gene, the authors contrasted the expected and observed numbers of variants that cause truncated proteins––to search for regions containing lower-than-predicted levels of protein-truncating variants. This allowed them to identify several thousand genes highly sensitive to such variants, i.e. unable to function normally after loss of one copy of the gene, even if the other copy is intact (haploinsufficiency). About two-thirds of these genes have not yet been associated with disease(!), but mutation probably leads to embryonic death or strongly affects fitness in some other way. These genes are also intolerant of variants in regulatory DNA sequences that would markedly alter levels of RNA synthesis from the gene, and are more likely than other genes to be implicated in genome-wide association (GWA) studies of common disease.
For cytochrome P450-ologists, CYP26B1 is the only P450 (out of 57 CYP genes in the human genome) on the list. This was found (by David R Nelson) on line 388 of Suppl. Table 13. This finding is particularly mystifying because CYP26A1, CYP26B1 and CYP26C1 all encode enzymes involved in retinoic acid hydroxylation. And all three of these genes are “early embryonic-lethal in mouse”––which would suggest that these three genes (unlinked, i.e. on different chromosomes) would be redundant with one another. One possible explanation is that perhaps each of these genes is very cell-type-specific in their patterns of expression in embryonic stem cells, and/or exhibit time-dependent functions (i.e. “switch on” during gestational day 1, 2 and 3). Another possibility (suggested by David R Nelson) involves haploinsufficiency (loss of one allele causes lethality) for CYP26B1, whereas perhaps CYP26A1 and CYP26C1 are able to function at 50% of normal levels. Encoding for enzymes, of course, also would fit the criteria for additive inheritance.
Jay Shendure. A deep dive into genetic variation. 18 August, 2016 vol. 36: 277 Nature
COMMENT:
For one more time, I wish thank Dan for these stimulating emails. Today I spent a couple of hours reading the paper and the searching the Supplemental Tables.
So, it turns out that in addition to CYP26B1, there are three more genes involved in the Retinol Metabolism Pathway.
These include:
- RDH10 (retinol dehydrogenase-10), encoding a protein that converts retinol to retinal, and
- ALDH1A1 and ALDH1A3, encoding proteins that metabolize retinal to retinoic acid.
Knockout of the RDH10 and the ALDH1A3 genes are embryolethal in mice (ALDH1A3 not until birth). The mouse Aldh1a1(–/–) knockout mice are normal and fertile, but I know why (it will be published from my lab soon). It is surprising (just like the CYP26A1 and CYP26C1 genes) that the ALDH1A2 gene is not included in the Supplemental Tables of this papr. And now I realize that the possible explanations––given by Dan and David R Nelson for the CYP26A1 and CYP26C1 genes––may also apply for the ALDH1A2 gene.
Anyway, based on these and also in Supplemental Table 16 (lane 53), the retinol pathway appears to be a very important pathway during early embryoenesis in the human.
COMMENT:
It is always satisfying (for me) to hear that another of my colleagues, … somewhere in the world, … who shares these GEITP emails, has been helped out, ……
i.e. something becomes clarified in his/her mind … about one or another research finding that they’ve found difficult to comprehend or explain. 🙂
This encourages me to keep this going. Interestingly, no one seems to want to “opt out” from receiving these emails––which sometimes come as a flood of five or more in one day, other times a drought of no messages for a week or longer––and more and more keep wanting to be included in these GEITP-shared emails.