A genetics-led approach defines the drug target landscape of 30 immune-related traits

It is assumed that human genetics can help identify new drug targets. However, the best way to prioritize genes as therapeutic targets remains controversial. Authors [see attached article & editorial] describe a framework to prioritize potential targets by integrating genome-wide association studies (GWAS) data with genomic architecture, development of diseases, and network connectivity. Although their genetics-led drug-target prioritization approach is focused on immune cell-mediated traits, this framework should also be applicable to non-immunologically-mediated diseases.

Authors state that there are two general approaches to prioritize genes from human genetic studies as therapeutic targets. The first is a gene-centric approach. One model takes advantage of trait-associated alleles (remember: an allele is one copy of the gene; the other allele represents the second copy of the gene on the other chromosome) to estimate dose–response curves; in this model, the trait-associated alleles could arise from common-variant association studies, rare-variant association studies, or studies of rare Mendelian phenotypes. For common diseases — examples include PCSK9 (proprotein convertase subtilisin/kexin type-9 in coronary artery disease) and TYK2 (tyrosine kinase-2 in immunologically mediated diseases), and for rare diseases, examples include CFTR [CF trans-membrane conductance regulator (ABCC7) in cystic fibrosis] and SMN1 (survival of motor neuron 1, telomeric in spinal muscular atrophy).

Another gene-centric model nominates individual genes that arise from GWAS, by using genomic features, and other annotations — such as phenotype hierarchy. A benefit of this ‘genomic-features’ model is that many GWAS signals arise from non-coding regions; therefore, prioritizing individual genes from an implicated region is difficult. An additional benefit of the genomic-features model is that it does not require more than a single trait-associated allele (which is usually the case for GWAS).

A second approach is to build networks, or pathways, based on connectivity among ‘seed genes’ implicated by human genetics and then to expand the network to include non-seed genes by using orthogonal data such as protein–protein interactions. Seed genes can originate from either the allelic-series model or the genomic-features model. An advantage of this pathway-centric approach is that many potential targets do not contain naturally occurring variants that disrupt gene function — yet they are still associated with a relevant trait.

Authors [see attached article] developed the priority index pipeline, taking (as inputs) GWAS variants for specific immune traits. These variants are predominantly regulatory, commonly act at a distance, and are often context-specific (pertaining to a distinct case). Authors used genomic predictors to identify and score genes likely to be responsible for GWAS signals (denoted as ‘seed genes’), based on: [a] genomic proximity to a disease-associated single-nucleotide variant (SNV), accounting for linkage disequilibrium (non-random association of alleles at different loci in a given population, due to inheritance) and genomic organization; [b] physical interaction, as evidenced by chromatin conformation in immune cells; and [c] modulation of gene expression, evidenced by expression quantitative trait loci (eQTL) in immune cells.

Authors demonstrate how their genetics-led drug-target prioritization approach (the priority index) successfully identifies current therapeutics, predicts activity in high-throughput cellular screens, enables prioritization of underexplored targets, and allows for determination of target-level trait relationships. The priority index is an open-access, scalable system accelerating early-stage drug target selection for immune-mediated disease. It should also be applicable for other forms of human complex diseases. 😊


Nat Genet July 2019; 51: 1082-1091 & pp 1073-1075 (News’N’Views editorial)

Posted in Center for Environmental Genetics | Comments Off on A genetics-led approach defines the drug target landscape of 30 immune-related traits

Recessive gene disruptions in autism spectrum disorder

These GEITP pages have often chatted about autism spectrum disorder (ASD) — because it represents a mysterious complex disease that is clearly multifactorial (i.e. the trait represents contribution of genes, epigenetic effects, environmental factors, and perhaps endogenous influences and the gut microbiome). Although the parameters for diagnosing ASD have changed (thereby allowing many more patients to be ‘classified’ today as having ASD), the incidence of the disorder also seems to be increasing dramatically, compared with that of 50 years ago — suggesting environmental contributions (e.g. Western diet, lifestyle, excessive TV- and cell phone-watching, physical inactivity among our young). In the 1960s-70s, for example, I saw only one patient with a definitive diagnosis of ASD (out of perhaps 10,000 patients I had contact with), whereas today the frequency is reported to be as high as one in 60 children. ☹

Genome-wide association studies (GWAS) and whole-genome sequencing (WGS) studies strongly implicate both common variants and

rare de novo variants in ASD. Recessive mutations have also been suggested. Authors [see attached article] performed a systematic analysis of whole-exome sequencing (WES) data from the Autism Sequencing Consortium, representing 2,343 affected and 5,852 unaffected individuals. Authors classified a total of 696,143 autosomal loss-of-function (LOF; ‘autosome’ is any chromosome that is not a sex chromosome) events (representing 28,685 unique variants in 11,745 unique genes) that introduce a stop codon or disrupt a canonical splice site (both of which mess up the mRNA and therefore the protein). After excluding common variants (i.e. allele frequency >1%), there were 84,645 rare LOF events (27,648 unique alleles) for an average of ~10 LOF mutations per individual. After computational phasing, authors found 298 events (after filtering to exclude common variants), which are consistent with complete gene knockouts (homozygous or compound heterozygous LOF mutations), affecting 266 patients.

Affected individuals were disproportionately more likely to harbor a gene knockout (62% more likely compared with unaffected individuals). To control for possible differences in population stratification and family structure (e.g. founder effects and/or consanguinity in the Finnish and Middle Eastern cohorts), authors also normalized to the background burdens of bi-allelic synonymous variants (alteration in a nucleotide but no change in an amino acid). ASD individuals continued to exhibit higher knockout rates after normalization. Based on the observed ascertainment differentials between affected and unaffected individuals, these burdens predict a contribution of bi-allelic LOF alleles to ~1–2% of ASD cases.

Authors documented bi-allelic disruption of known or emerging recessive neurodevelopmental genes (CA2, DDHD1, NSUN2, PAH, RARB, ROGDI, SLC1A1, USH2A) — as well as other genes not previously implicated in ASD — including FEV (FEV transcription factor, ETS family member), which encodes a key regulator of the serotonergic circuitry. These data refine estimates of the contribution of recessive mutations to ASD and suggest new paths for understanding and illuminating previously unknown biological pathways responsible for this complex disease. When the emphasis on “causation of ASD” has almost always been on “polygenic multifactorial inheritance,” here comes a “show-stopper” — a study that reminds us to be humble, i.e. sometimes this very complex disease can be caused by one, or a very small number of, genes. ☹


Nat Genet July 2019; 51: 1092-1105

Posted in Center for Environmental Genetics | Comments Off on Recessive gene disruptions in autism spectrum disorder

High-throughput identification of human SNPs affecting regulatory element activity

Frequently these GEITP pages have chatted about gene expression, which is constantly being altered by environmental signals; these signals stimulate a cascade of downstream reactions which of course begin by tickling the regulatory elements that control gene expression, make the various genes go up or down in their expression. About 85 million single-nucleotide variants (SNVs) have been identified in human genomes — so far — and, of course,that number will continue to rise, as increasingly more genomes are sequenced. [With the first publication of the ‘complete human genome’ in April 2003 (which was anything BUT complete), I recall one colleague (who shall remain unidentified), asking me, “Is that it? The total mutations in all human genomes are 1400?”]

The vast majority of these SNVs are located in noncoding regions (i.e. DNA stretches where no mRNA or protein will result), and each typical human genome has ~500,000 SNVs, with non-reference alleles overlapping regulatory elements such as enhancers [DNA sequences that increase the level of transcription of a gene that is (almost always) located nearby, on the same chromosome] and promoters [regions of DNA (usually ~100–1000 base-pairs in length) that lead to initiation of transcription of a particular gene; promoters are located near the transcription start-sites of genes, ‘upstream’ on the DNA strand]. It has become increasingly clear that these noncoding SNVs can have a substantial impact on gene regulation, thereby contributing to phenotypic diversity (variation in a trait) and being associated with a wide range of human disorders.

Genome-wide association studies (GWAS) and expression quantitative trait locus (eQTL) mapping can identify candidate SNVs that may drive a particular trait or disorder (e.g. cancer, obesity, Alzheimer disease), or the expression level of individual genes (e.g. height, bone mass index), respectively. Unfortunately, even the largest GWAS and eQTL studies rarely achieve single-SNV resolution — due largely to linkage disequilibrium (LD; i.e. the non-random association of alleles at different loci in a given population). In practice, tens to hundreds of linked SNVs are correlated with a single trait. Although new fine-mapping techniques, integration with epigenomic data, deep-learning computational techniques, and GWAS of extremely large populations can help to achieve higher resolution — pinpointing of the causal SNVs remains a major challenge.

Having a list of all SNVs in the human genome, which have the potential to alter gene regulation, would lessen this problem. Ideally, the regulatory impact of SNVs would be measured directly. Two high-throughput methods have been employed for this purpose. [1] First, changes in chromatin features — such as DNase sensitivity and various histone modifications — have been mapped in lymphoblasts or primary blood cells derived from sets of human individuals with fully sequenced genomes. Here, the chromatin marks serve as proxies to infer effects on regulatory elements, with the caveat that a change in regulatory activity may not always be detected as a change in chromatin state, or vice versa. Furthermore, many traits do not manifest themselves in blood cells. And other cell-types are more difficult to obtain for epigenome mapping. ☹

[2] An alternative functional readout is to insert DNA sequence elements carrying each allele into a reporter plasmid. Upon transfection of these plasmids into cells, the promoter or enhancer activity of these elements can be measured quantitatively. Different cell-types may be used as models for corresponding tissues in vivo. Large-scale versions of this approach are referred to as massively parallel reporter assays (MPRAs), and they have been used to screen tens of thousands of SNVs. Each of these studies has yielded tens to, at most, several hundreds of SNVs that significantly alter promoter or enhancer activity. Because these MPRA studies cover only a tiny fraction of the genome, it is likely that many more SNVs with regulatory impact remain to be discovered.

Authors [see attached article] surveyed the regulatory effects of 5.9 million SNVs in two different cell-types — providing a resource that helps to identify causal SNVs among candidates generated by GWAS and eQTL studies. Authors leveraged the throughput and resolution of the survey of regulatory elements (SuRE) reporter technology to survey the effects of these 5.9 million SNVs (including 57% of the known common SNVs) on enhancer and promoter activity. They identified more than 30,000 SNVs that alter the activity of putative regulatory elements, usually in a cell-type-specific manner. Integration of this dataset with GWAS results — should help pinpoint SNVs that underlie human traits.


Nat Genet July 2019; 51: 1160-1169

Posted in Center for Environmental Genetics | Comments Off on High-throughput identification of human SNPs affecting regulatory element activity

Advances in epigenetics link genetics to the environment and disease (Nice Review)

Biologists have long sought to understand how a fertilized egg can form an organism composed of hundreds of specialized cell-types, each expressing a defined set of genes. Same with an acorn: how does this little nut form an oak tree, complete with leaves, branches, trunk and roots? Cellular identity is now accepted to be the result of the expression of specific combinations of genes. This expression pattern must be established and maintained — two distinct, but interconnected, processes. The capacity of a single fertilized egg (zygote) to establish innumerable distinct cell-types depend to a large extent on the coordinated deployment of hundreds of transcription factors that bind to specific DNA sequences which then activate, or repress, transcription of cell lineage genes.

This establishment phase corresponds most closely to what is generally cited as the first definition of epigenetics (by Conrad Waddington), namely “the study of the mechanisms by which the genotype produces the phenotype (trait) in the context of development.” The maintenance phase often involves a plethora of non-DNA sequence-specific chromatin cofactors that set up and maintain chromatin states via cell division, and — for extended periods of time — sometimes in the absence of the initial transcription factors. This phase is more akin to a definition of epigenetics (put forward by Nanney, and then elaborated on by Riggs and Holliday, and further modified by Bird and others) to mean “the inheritance of alternative chromatin states in the absence of changes in DNA sequence.”

DNA methylation was proposed, early on, as a carrier of epigenetic information — with subsequent work revealing that chromatin proteins (histones) and noncoding RNAs are also important for this process. For example, histone variants and histone modifications can influence local chromatin structure, either directly or indirectly. Such modifications can be heritable, but reversible, and are governed by a series of “writers” (that deposit the modifications), “readers” (to interpret them) and “erasers” (to remove them). Finally, higher-order 3-dimensional chromosome folding is also thought to modulate gene expression and might contribute to inheritance.

This article [see attached] reviews how the epigenetics field has evolved over the last few decades, and details some of the recent advances — that are changing our understanding of biology. Authors discuss the interplay between epigenetics and DNA sequence variation, as well as implications of epigenetics involvement in cellular memory and plasticity (especially of neurons in the nervous system). Authors also consider the effects of the environment — and both intergenerational and transgenerational epigenetic inheritance — on biology, disease and evolution (‘intergenerational’ = within the same generation; ‘transgenerational’ = the trait is ‘there’ in grandparents, parents and children). Finally, some new frontiers in epigenetics are presented, with implications for human health. 😊


Nature 25 July 2019; 571: 489-499

Posted in Center for Environmental Genetics | Comments Off on Advances in epigenetics link genetics to the environment and disease (Nice Review)

CRISPR/Cas9 Whole-Genome Screen Identifies Genes Required for AHR-Dependent Induction of Functional CYP1A

Very central to the topic of gene-environment interactions are endogenous and exogenous “signals,” recognized by the basic-helix/loop/helix (bHLH) transcription factor, aryl hydrocarbon receptor (AHH) — followed by activation of numerous genetic and biochemical pathways that represent “responses” to those signals. One of the responses is the up-regulation of cytochromes P450 family 1 (CYP1A1, CYP1A2, CYP1B1), three enzymes that metabolize numerous lipid mediators (e.g. downstream products of arachidonic acid, eicosapenatenoic acid, and docosahexaenoic acid) as well as many polycyclic aromatic hydrocarbons (PAHs; e.g. benzo[a]pyrene, arylamines, and other products of combustion) and halogenated hydrocarbons (e.g. polychlorinated biphenyls, commonly found in toxic waste dump sites).

How many genes are required for AHR-dependent induction of CYP1A? Ligand-bound AHR translocates from cytoplasm to nucleus, where it dimerizes with the aryl hydrocarbon receptor nuclear translocator (ARNT) protein. The AHR/ARNT dimer is then able to bind to enhancer regions of responsive genes to activate transcription. AHR participates in chemical carcinogenesis caused by PAHs — when the three CYP1 enzymes, that are induced by activated AHR in cell-type-specific fashion, generate carcinogenic metabolites of PAHs and arylamines; these induced enzymes also participate in detoxication of carcinogenic and toxic PAHs and arylamines, thus representing a double-edged sword that can be either beneficial or detrimental — depending on genotype, route-of-exposure, size of the exposure dose, time over which the exposure occurs, and organ- and cell-type specificity.

Authors [see attached article] used a particular mouse genome-wide CRISPR/Cas9 library to identify novel genes in the AHR pathway — by taking advantage of a benzo[a][pyrene-selection assay that this lab had originally used (40 years ago..!!) to identify fundamental AHR-pathway genes in mouse hepatoma cells. In addition to the mouse Ahr, Arnt, and Cyp1a1 genes [Cyp1a1 & Cyp1b1 are not expressed in these hepatoma cells] — authors identified additional putative AHR-pathway genes; these included P450 oxidoreductase (Por) and five genes in the heme biosynthesis pathway: 5’-aminolevulinate synthase-1 (Alas1), hydroxymethylbilane synthase (Hmbs), uroporphyrinogen decarboxylase (Urod), coproporphyrinogen oxidase (Cpox), and ferrochelatase (Fech). These last five genes are credible candidates, because heme is an essential prosthetic group of all cytochrome P450 proteins.

These experiments demonstrate the power of high-sensitivity CRISPR/Cas9 library genome-wide genetic screening. This paper also shows a clever approach for identifying genes in any pathway that includes a “sensor” gene which detects an endogenous or exogenous signal, and subsequently all downstream genes that participate in the response to that signal. 😊


Toxicol Sci. 2019 Aug 1;170(2):310-319

Posted in Gene environment interactions | Comments Off on CRISPR/Cas9 Whole-Genome Screen Identifies Genes Required for AHR-Dependent Induction of Functional CYP1A

Artificial Intelligence (AI) used to test evolution’s oldest mathematical model

I cannot claim to be an expert on artificial intelligence (AI) or machine-learning, but I would say that the essence of this approach is as follows:

Many things in science (more so in biology than perhaps in chemistry, and even less so in physics or mathematics) appear, to the human mind, as “extremely complicated patterns” — which humans are unable to fathom or interpret or explain in any objective, nonbiased, quantitative way. To me, this touches on “Chaos Theory” (that branch of mathematics which involves complex systems whose behavior is highly sensitive to slight changes in conditions, such that small alterations can give rise to strikingly differenet consequences).

Thus, what AI or machine-learning does — is an attempt to minimize the bias, to examine these patterns (the greater number N of observations, the better), and to quantify the data into the least-random (or highest-likelihood) dataset or explanation. Also, the data can be ranked as a gradient — from the highest-likelihood to the lowest-likelihood datasets or explanations. This type of analysis therefore removes any human bias from the experiment. Testing the Müllerian mimicry theory in Heliconius butterflies [see attached article, which was described in yesterday’s email pasted below] represents an excellent example of an extremely complicated pattern that is much better quantified and analyzed by AI/machine-learning rather than by any test using our human (biased) minds. 😊

Other examples of extremely complicated patterns obviously might include: predictions of meteorological and climate patterns; human complex diseases (e.g. autism spectrum disorder, mental depressive disorder, hypertension); phenotypic heterogeneity seen in the response to a drug; phenotypic heterogeneity to a complex mixture of environmental toxicants (e.g. substantial exposure to a toxic waste dump site).


Sci Advanc 14 Aug 2019; 5: eaaw4967

Posted in Evolution and genetics | Comments Off on Artificial Intelligence (AI) used to test evolution’s oldest mathematical model

HGNC Summer 2019 NewsLetter and DR Nelson Blog on “Seeing Red

Pasted below is the Summer 2019 NewsLetter for the HUGO Gene Nomenclature Committee (HGNC) — which has now combined forces with the Vertebrate Gene Nomenclature Committee (VGNC). Below that is their first blog, this one by that “famous gene superfamily curator”, David R Nelson.


The HGNC is 40!

We are excited to announce a significant birthday for the HGNC – it is now 40 years since the first full human gene nomenclature guidelines were published, following discussions at the 1979 Human Genome Meeting in Edinburgh, Scotland. We are proud to be able to call ourselves one of the longest standing biocuration projects and we look forward to taking the project into its fifth decade. The whole team celebrated with cake!

The addition of public curator notes to Symbol Reports

We have added a new field to the core data section of our Symbol Reports entitled ‘Curator Notes’. This field allows us to add free text to our Reports to clarify certain aspects of the symbol, name or locus type. For example, the Symbol Report for GULOP displays the text: ‘This pseudogene has been named based on its functional ortholog in another species’ as GULOP has no human parent gene, but is named relative to the protein coding mouse ortholog Gulo; the Symbol Report for C9orf72 displays the text ‘The community currently endorses the use of this placeholder symbol. HGNC has no current plans to update this symbol without community consensus’ because this symbol appears in such an overwhelming number of recent publications.
Changes to our withdrawn entries

We have recently improved the way we display and support searching of our withdrawn entries. These changes have been made in preparation for the upcoming withdrawal of HGNC phenotype entries, as we no longer approve symbols for phenotypes. All requests for new phenotype symbols should be directed to OMIM.

To make withdrawn entries obvious we have added a red warning triangle followed by the text ‘This record has been withdrawn by HGNC’ to the top of each relevant Symbol Report. We have added a new field ‘Symbol Status’ which displays the text ‘Entry Withdrawn’, text which instead used to appear in both the approved gene name and the locus type fields. This means we are now able to display the former approved gene name and locus type for these entries, which were previously missing. Withdrawn entries may also contain the curator notes field, which we plan to use to provide information on the withdrawn phenotype entries in the future.

The gene symbols of withdrawn entries were also previously displayed with the term ~withdrawn appended after the symbol, which meant that in order to find withdrawn entries on genenames.org it was necessary to add a wildcard when using our ‘Search all’ function, i.e. the search term BLYM* was necessary to return BLYM~withdrawn. You can now find the correct withdrawn entry using the former approved symbol. To allow filtering of withdrawn entries, we have added the facet ‘Filter by gene entry status’ to our search function. If search results contain withdrawn entries, this filter gives the option ‘Entry withdrawn’ in addition to ‘Approved’, e.g. searching with the root symbol CYP* shows that there are 3 withdrawn entries, in addition to the 131 approved entries.
Macaque (and a little mouse lemur) now in VGNC!

Regular readers may remember from our last newsletter that we were in the process of adding approved gene symbols for rhesus macaque into vertebrate.genenames.org. We are happy to announce that we have now added this species to our project and have already approved an impressive 10,989 macaque gene symbols! You can browse through these symbols by choosing the Gene Symbol Reports dropdown from the Gene Symbol Data tab and selecting ‘Macaque’ from the Species filter on the left-hand side. You can download all macaque data by visiting the VGNC Statistics & Downloads files and selecting ‘Macaque’ from the Species dropdown box at the top of the page.

We have also added a small number of gene symbols for the (extremely cute) mouse lemur species. All of these genes are part of the cytochrome P450 (CYP) family and were manually curated by our CYP expert, David Nelson. We are going to be adding CYP genes for other primate species in the near future, so please watch this space. David kindly provided a recent blog post for us called ‘Seeing Red’ about two separate CYP genes that allow some fish (CYP27C1) and birds (CYP2J19) to see in infra-red, and also allow some birds to have red plumage, beaks and/or legs [pasted below].
Progress on replacing placeholder symbols

Renaming placeholder symbols to provide more informative nomenclature that is transferable across species continues to be a priority for the HGNC. Here are some examples of placeholders that we have renamed in the last couple of months, along with links to their renamed VGNC orthologs, which all have the same informative symbol as human:

C6orf222 to BNIP5, BCL2 interacting protein 5 chimp, cow, dog, horse, cat.
C6orf203 to MTRES1, mitochondrial transcription rescue factor 1 chimp, cow, dog, horse.
C12orf81 to TMDD1, transmembrane and death domain 1 (no orthologs identified via the VGNC pipeline at this time).
FAM57A to TLCD3A, TLC domain containing 3A chimp, cow, dog, horse, cat, macaque.
FAM57B to TLCD3B, TLC domain containing 3B chimp, cow, cat.
FAM173A to ANTKMT, adenine nucleotide translocase lysine methyltransferase chimp, cow, cat, macaque.
Gene Symbols in the News

Two of our symbols have appeared in news articles about treating disease using therapies targeted to specific genes and their products. Gene therapy is already successfully being used to treat hereditary transthyretin-mediated amyloidosis − RNA interference of mutated TTR (transthyretin) mRNA prevents the build-up of toxic TTR protein in carriers and has allowed a surgeon with the disease to continue his career. There is hope for the future development of an effective therapy to treat patients with ALS caused by expansion repeats in the now-famous C9orf72 gene: research in yeast found that inhibiting the activity of the RPS25 protein could halt the accumulation of toxic proteins produced from the repeat expansions in the C9orf72 promoter region. This means that a RPS25-targeted treatment may one day be possible if the same effects are reproducible for humans.

There have also been several articles linking genes to incidence of disease. Pseudogenisation of the CMAH gene (approved gene symbol CMAHP note the ‘P’ for pseudogene in the gene symbol as a result) in humans could explain why we are more susceptible to heart attacks than other great apes. The loss of the encoded CMAH enzyme means that humans do not produce N-glycolylneuraminic acid and as a result when humans consume dietary sources of this sialic acid from red meat, there may be an increased immune response leading to increased inflammation and possibly a higher likelihood of atherosclerosis. A mutation in the MEMO1 gene that may affect the development of the cortex has been linked to autism spectrum disorder. Researchers have found that roughly half of all people carry a mutation in the CLTCL1 gene that has been associated with an increased rate of glucose clearance and a resulting decreased risk of type-2 diabetes. The protective CLTCL1 variant increased within the human population during the advent of cooking.
Seeing Red

HGNC, VGNC, Guest Post · 12 Jul 2019

This guest blog post was written by David R Nelson, one of our external advisors who specializes in the biology of the cytochrome P450s. David is a Professor at the University of Tennessee and has been studying the evolutionary history of cytochrome P450s in species from across the tree of life for over 30 years. The cytochrome P450s are a family of genes that code for enzymes important for metabolism. They have roles in many different metabolic processes, for example, cholesterol synthesis and drug metabolism.
Seeing Red

CYP27C1 is a cytochrome P450 in the mitochondrial clan. This clan was formed uniquely in animals by a mistargeting event sending a P450 to the mitochondrial inner membrane [Nelson, et al., 2013].

Most P450s are found in the ER membrane. Other vertebrate members of the mito clan include the CYP11 and CYP24 families. Until 2015, CYP27C1 was an orphan P450, with no known function. It has two close relatives: CYP27A1 is a sterol 27- hydroxylase in bile acid synthesis and CYP27B1 is 25-hydroxy-vitamin D3 1α- hydroxylase that forms 1-alpha,25-dihydroxyvitamin D3, the active form of vitamin D.

A novel function was found for CYP27C1 in zebrafish [Enright et al., 2015]. It desaturates the photoreceptor chromophore precursor vitamin A1 (the precursor of 11-cis-retinal) into vitamin A2 (the precursor of 11-cis-3,4-didehydroretinal) by forming a new double bond in the ring. This increases the conjugation of double bonds in the molecule and extends the sensitivity to longer wavelength infra-red light. Thus, CYP27C1 extends the visual range of zebrafish into the infrared range. The switch from A1 to A2 is called the rhodopsin-to-porphyropsin switch and is seen in many freshwater fish and amphibians, but not salt-water species [Enright et al., 2015], [Morshedian et al., 2017].

A similar spectral shift is achieved by red cone oil droplets in the cone photoreceptors of birds and turtles [Toomey and Corbo, 2017]. A different P450, CYP2J19, makes a red ketocarotenoid pigment found in these oil droplets in the eyes of birds and turtles [Lopes et al., 2016]. This same gene is responsible for the red color of some bird plumage and other parts like beaks and legs [Twyman et al., 2018]. In humans the CYP27C1 gene is expressed in the skin, not the eye [Johnson et al., 2017]. Human CYP27C1 performs the same reaction as CYP27C1 in zebrafish [Kramlinger et al., 2017], but it does not supply visual pigments to the eye. Instead, there are four opsin genes expressed in the skin (OPN1SW , RHO (alias OPN2), OPN3 and OPN5) [Haltaufderhyde, et at., 2015]. There may be a role in some type of light-driven signaling involving these skin-expressed opsins and the formation of 11-cis-3,4-didehydroretinal by CYP27C1.

Posted in Center for Environmental Genetics | Comments Off on HGNC Summer 2019 NewsLetter and DR Nelson Blog on “Seeing Red

New alcohol-related genes suggest shared genetic mechanisms with neuropsychiatric disorders

Alcoholism is a multifactorial trait that is manifested by genes (genotype), epigenetic factors (epigenome), environmental effects (alcohol consumption, frequently accompanied by smoking), endogenous influences (e.g. eventually heart and liver disease), and probably each person’s microbiome (primarily gut flora; contribution of bacterial metabolism). In a study comprising 195 countries and territories, excessive alcohol consumption was found to be responsible for ~2.2% and ~6.8% of age-standardized deaths of women and men, respectively. Most genetic studies of alcohol consumption focus on alcohol dependency — although the population burden of alcohol-related disease mainly reflects a broader range of behaviors associated with alcohol consumption. Small decreases in alcohol consumption could have major public health benefits, as well as on rates of mortality.

Unfortunately, genetic studies of alcoholism to date have robustly identified only a small number of associated genetic variants; these include a mutant allele (in Asians) of aldehyde dehydrogenase-2 (ALDH2) gene plus variants in the aldehyde dehydrogenase (ADH) gene family — a group of enzymes that participate in the metabolism of aldehydes — including a cluster of genes on chromosome 4q23 (ADH1B, ADH1C, ADH5, ADH6 and ADH7). Authors [see attached article] report a genome-wide association studies (GWAS) meta-analysis of alcohol intake among individuals of European ancestry [taken from the UK Biobank (UKB), the Alcohol Genome-Wide Consortium (AlcGen), and the Cohorts for Heart and Aging Research in Genomic Epidemiology Plus (CHARGE+) consortia].

The UKB is a prospective cohort study comprising ~500,000 individuals recruited between ages 40 yr and 69 yr; participants were asked to report their average weekly and monthly alcohol consumption through a self-completed touchscreen questionnaire. On the basis of these reports, authors then calculated the alcohol intake in grams per day.

The meta-analysis included 480,842 people of European descent, attempting to characterize the genetic architecture (i.e. the underlying genetic basis of a trait and its properties of variability; phenotypic variation for quantitative traits is, at the most basic level, the result of segregation of alleles at quantitative trait loci) of alcohol intake. Authors identified 46 new common genetic loci and investigated their potential functional importance — using magnetic resonance imaging (MRI) data and gene expression studies. Note that, high on the list of statistical signifiance, is SLC39A8 (P = 1.3e–15), a gene encoding a cation influx transporter (which is expressed in embryonic stem cells, and every cell-type examined except perhaps skin; endogenous SLC39A8 substrates include manganese, zinc, selenium and cobalt). Authors also note that — many of these identified genetic pathways are not only associated with alcohol consumption — but are genetic mechanisms that are shared with neuropsychiatric disorders such as schizophrenia.


Nat Hum Behav 29 Jul 2019; doi: 10.1038/s41562-019-0653-z [Epub ahead of print]

Posted in Gene environment interactions | Comments Off on New alcohol-related genes suggest shared genetic mechanisms with neuropsychiatric disorders

Gene co-expression network-based analysis of multiple brain tissues reveals novel genes and molecular pathways underlying major depressive disorder (MDD)

Major Depressive Disorder (MDD) is a serious mental health disorder with a global lifetime frequency of ~12% (17% of women, 9% of men). MDD is well known to be a very complex multifactorial trait (i.e. contributions from genetics, epigenetic factors, environmental effects, endogenous influences, and each individual’s microbiome). A recent genome-wide association study (GWAS) meta-analysis (135,458 MDD cases, 344,901 controls, plus two other GWAS totaling 246,363 cases, 561,190 controls) identified 102 independent variants associated with major depression — 87 of which were replicated in an independent sample of 1,509,153 individuals..!! 😊

Detailed functional studies have shown that many of these loci possess common [i.e. minor allele frequencies (MAFs) of greater than >0.01, or >1%] single nucleotide variants (SNVs) that regulate expression of multiple genes in brain tissue — with putative roles in central nervous system (CNS) development and synapse plasticity. Large-scale GWAS have also discovered altered immune pathways; these results suggest disease-associated SNVs modify MDD susceptibility by changing expression of target genes in a tissue-specific manner. Genes regulate the activity of one-another in large co-expression networks. Therefore, SNVs may not only affect the activity of a single target gene, but activities of multiple biologically related genes within the same co-expression network to influence manifestation of a complex trait such as MDD. Integration of GWAS SNV genotype data with gene co-expression network data across multiple tissues may be useful to elucidate biological pathways and processes underlying highly polygenic complex disorders such as MDD.

Genome-wide gene expression data have been successfully integrated with SNV genotype data to prioritize risk genes and reveal possible mechanisms underlying susceptibility to a range of psychiatric disorders; however, collection of phenotype, SNV genotype,

and gene expression data measured from each individual is impeded by cost and tissue availability — and identifying causal variants can be difficult due to linkage disequilibrium (LD; in population genetics, LD is the nonrandom association of alleles at different loci in a given population) and confounders (i.e. from environmental and other factors described above). Recent approaches address these limitations by integrating GWAS summary statistics with independent gene expression data provided by large international consortia [e.g. the multi-tissue Genotype-Tissue Expression (GTEx) project]. The most recent release of the GTEx project (version 7) contains SNV genotype data linked to gene expression across 53 tissues from 714 donors — including 13 brain tissues from 216 donors; this represents a valuable resource for studying gene expression and its relationship with genetic variation, known as expression quantitative trait loci (eQTL) mapping.

Using this framework for identifying individual risk genes, plus gene co-expression networks, and using GWAS summary statistics and gene expression information across multiple human brain tissues and whole blood, authors [see attached article] developed a novel gene-based method that leverages tissue-specific eQTL information to identify 99 biologically plausible risk genes associated with MDD, of which 58 are newly discovered. Among these novel associations is Complement Factor 4A (C4A) — recently implicated in schizophrenia through its role in “synaptic pruning” during postnatal development.

MDD risk genes were enriched in gene co-expression modules in numerous brain tissues and the implicated gene modules contained genes involved in synaptic signaling, neuronal development, and cell transport pathways. Modules enriched with MDD signals were strongly preserved across brain tissues, but were weakly preserved in whole blood; this finding underscores the importance of using disease-relevant tissues in genetic studies of psychiatric traits. The novel analytical framework [reported herein] should be useful to gain fundamental insights into CNS functioning in MDD and other brain-related traits. Moreover, this study underscores the existence of a “genetic predisposition,” from the time of brain formation in utero, which — when stimulated by appropriate signals (i.e. from epigenetic factors, environmental effects, endogenous influences, and each individual’s microbiome) — “push the individual ‘over the edge’ and into a full-blown manifestation of MDD.” ☹


PLoS Genet Jul 2019; 15: e1008245

Posted in Gene environment interactions | Comments Off on Gene co-expression network-based analysis of multiple brain tissues reveals novel genes and molecular pathways underlying major depressive disorder (MDD)

Evidence that DNA repair genes, a family of tumor suppressor genes, are associated with evolution rate and size of genomes

Consistent with the theme of “evolution” that is often discussed in these GEITP pages, “adaptive radiation” is a well-known phenomenon in evolutionary biology — in which a taxon (a taxonomic group of any rank, such as a class, family, order, genus, species) is divided into multiple species, which then had adapted to various environments, over a short evolutionary time. Although this phenomenon has been popularized mostly in island studies [e.g. Darwin’s finches, Hawaiian fruit flies (Drosophila)] — other major adaptive radiations have occurred [e.g. cichlids (a tropical freshwater fish of the family Cichlidae), bats, and cetaceans (a marine mammal of the order Cetacea: whale, dolphin, or porpoise)]. It is very likely that common evolutionary and molecular processes are seen in all taxa that have experienced adaptive radiation; however, no such common molecular pathways have been identified to date.

Authors [see attached article] considered living fossils and adaptive radiation as two very different evolutionary strategies: slow evolutionary rate versus rapid evolutionary rate, respectively. Living fossils are characterized by morphological stasis, low (taxonomic) diversity, and rareness; the apparent absence of their morphological stability and low diversification — suggest highly effective adaptations that decrease the need for phenotypic change, regardless of environmental or genetic changes. Living fossils are frequently referred to as examples of evolutionary success and evolutionary stasis (evolutionary stasis is commonly seen in the fossil record).

Classical examples of taxa — considered by most biologists as living fossils — are the crocodilians (crocodile, alligator), coelacanths (large, bony marine fish with a three-lobed tail fin and fleshy pectoral fins; thought to be related to the ancestors of land vertebrates, and known only from fossils — until one was found alive in 1938), and ornithorhynchus (another term for platypus). Similar to the category of adaptive radiation, there are no specific genes that are under selection in living fossil species. Authors [see attached article] attempted to identify any common molecular pathways that contributed to a specific evolutionary process in living fossils vs adaptive radiation species; they were principally interested in genes related to disease — because evolutionary studies may contribute to a better understanding of the function of “disease” genes.

Pathway analysis revealed that DNA repair and cellular response to DNA damage were most important for species that had evolved through adaptive radiation. Nucleotide excision repair and base excision repair were the most significant pathways. In addition, the number of DNA repair genes was found to be linearly related to the genome size and the protein number (proteome) of the 44 animal species analyzed (P <1.0 × 10-4). Authors also showed evidence that cancer-related genes play a special role in adaptive radiation species. Note this study is a completely “dry-lab” bioinformatics analysis, relying on existing databases (i.e. genomes already sequenced for all these 44 species). 😊 DwN Hum Genomics June 2019; 13: 26

Posted in Center for Environmental Genetics | Comments Off on Evidence that DNA repair genes, a family of tumor suppressor genes, are associated with evolution rate and size of genomes