A unified nomenclature for vertebrate olfactory receptors

Olfactory receptors (ORs) are G-protein-coupled receptors which are used for odor detection. Being the largest gene family in vertebrates, a typical mammalian genome harbors ~1000 OR genes and pseudogenes. However, the numbers of functional OR genes vary enormously among genomes of different animals, no doubt reflecting the adaptation of organisms to different environments [hence, we have gene-environment interactions (!!) 😊]. OR genes are distributed in clusters on most mammalian chromosomes; there are strong indications of a common ancestry for most OR clusters, a feature perhaps related to common cis-regulatory elements. Nevertheless, processes of gene duplication and gene deletion have taken place in each species, making orthology relationships difficult to determine, and hence requiring careful manual curation for assignment of gene nomenclature. Another complexity arises from the high content of OR pseudogenes in some organisms, such as human (55% pseudogenes). Thus, assigning orthology-based symbols to the OR gene superfamily is challenging and requires “conceptual translation” of all OR pseudogenes.

For many years, an official nomenclature system has been in place for human ORs that is widely accepted and used by the community. The human nomenclature is based on a sequence similarity classification of the OR repertoire, resulting in 18 families and >300 subfamilies, where symbols consist of the root “OR” followed by a family numeral, subfamily letter(s), and a numeral representing the individual gene within the subfamily. For example, OR3A1 encodes member 1 of family 3, subfamily A; OR7E12P is an OR pseudogene that is member 12 of family 7, subfamily E. A new gene is classified into the same subfamily — if it shows at least 60% protein sequence identity to the best “hit”; family membership is based on at least 40% sequence identity at the protein level. This classification system is based on a divergent evolutionary model of the ORs derived from phylogenetic analyses of ORs from multiple species — consistent with accepted nomenclature schemes for other multigene families [e.g. cytochrome P450 (CYP) and the UDP glucuronosyl-transferase (UGT) superfamilies], which use similar cutoffs. This nomenclature has already been applied to dog, platypus and opossum and is available to the community via a dedicated database, the Human Olfactory Receptor Data Explorer (HORDE).

Use of different nomenclature in different organisms creates difficulties — both when comparing genes across species and, especially, when the same gene is reported more than once under different names(!!). Although this situation is also common with other genes, it becomes especially confusing in large gene families that are found in multiple species, such as the ORs. Next-generation technologies are dramatically increasing the number of sequenced vertebrates; therefore, availability of a unified and widely accepted nomenclature that encodes homology relationships becomes more important than ever. Authors [see attached article] propose a unified nomenclature system for vertebrate OR genes and pseudogenes. The nomenclature is human-centric (therefore based on the human classification system for OR genes). Using a dedicated algorithm (Mutual Maximum Similarity, MMS), authors applied their nomenclature system to the OR repertoires of mouse, rat, cow, dog, horse, orangutan and chimpanzee, and also to zebrafish, a more distantly related vertebrate species.

Authors demonstrate that their nomenclature captures the phylogenetic relationships among the studied species and provides a powerful framework for diverse studies of vertebrate ORs. A unified nomenclature for the OR gene family can also serve as a model for other large multigene families — allowing researchers to easily make cross-species comparisons in complex groups of genes. All the nomenclature data are available from the HORDE database (https://genome.weizmann.ac.il/horde/), and are under consideration by the relevant species-specific nomenclature committees that are using an alternative OR nomenclature (i.e. mouse, rat, and zebrafish). The Vertebrate Gene Nomenclature Committee (VGNC) is currently naming genes within chimpanzee, cow, dog and horse; they have indicated they will adopt this OR nomenclature in these species.

DwN

BMC Evolutionary Biol 2020; 20: 42

This entry was posted in Center for Environmental Genetics. Bookmark the permalink.