These GEITP pages have the over-reaching theme of gene-environment interactions. This includes evolution of life (archaebacteria, eubacteria, plants, fungi & animals) and, accordingly, evolution of genes (especially in response to environmental stress, i.e. climate, diet, need for food & reproduction of the species). So — HOW DO GENES ORIGINATE? Most of us know about “gene duplication” (one gene ‘copies’ itself, then one of the two genes begins to mutate and take on other functions — which aid in survival of the organism). For the past ~80 years, the consensus view was that virtually all genes were derived from “ancestral” genes.
However, there is a second form of gene evolution, called “overprinting,” which proposes emergence of genes via expression of alternative open-reading-frames (ORFs) that overlap preexisting genes [see diagram in Fig. 1 of attached review]. [An ORF is a continuous stretch of codons (nucleotide triplets) that begins with a start codon (ATG) and ends at a stop codon (TAG, TGA or TAA). An ATG codon, within the ORF, may indicate where translation (from RNA into protein) is able to start — anew; thus, an ORF is any partial (RNA) reading frame that has the ability to be translated.]
These new ORFs may be out-of-frame with the preexisting gene, or antisense (i.e. opposite direction on the chromosome) to the preexisting gene; they may also be in-frame with the existing ORF — creating a truncated version of the original gene. [Recall that DNA is transcribed into genomic RNA (gRNA), which then gets spliced so that only exons are represented in the messenger RNA (mRNA), the introns being left behind.] The phenomenon of exonization (i.e. the recruiting of a new exon from non-protein-coding, intronic DNA sequences) also represents a special case of de novo gene birth, in which, for example, (often-repetitive) intronic sequences acquire splice sites through mutation, leading to de novo exons [see diagram in Fig. 1 of attached review]. Interestingly, such de novo exons are frequently found in minor splice variants — which may allow the evolutionary “testing” of novel sequences (by the cell or organism), while retaining the functionality of the major splice variant(s).
Whereas it was estimated (using the sequence data available at the time) the number of unique, ancestral eukaryotic exons might be less than 60,000 [Curr Opin Genet Dev 1991; 1: 464], and estimated that the vast majority of proteins belonged to no more than 1,000 families [Nature 1992; 357: 543] — it is now realized to be no longer that simple. After an entire yeast nuclear genome had been sequenced [Trends Genet 1996; 12: 263], the unexpected abundance of genes, lacking any known homologs, was discovered. More recently [thanks to whole-genome sequencing (WGS) becoming so easy], de novo origination of five genes — specific to Drosophila melanogaster and/or its closely related Drosophila simulans — was shown to be absent in other closely-related species [Proc Natl Acad Sci USA 2006; 103: 9935], suggesting even further the phenomenon of de novo gene birth. Interestingly, all five genes were preferentially expressed in the testes of male flies (hmm, apparently none was expressed in testes of female flies?). For more details, please check out the attached review of this intriguing topic. 😊
DwN
PLoS Genet May 2019; 15: e100816