What Is the Tree of Life? And what does that mean?

A universal Tree of Life (TOL) has long been a goal of molecular phylogeneticists. However, reticulation (i.e. a pattern resembling a net, instead of a “tree” with a single root) at the level of genes––and possibly at the levels of cells and species––renders any simple interpretation of such a TOL problematic (especially as applied to bacteria).

One of the several ways in which microbiology puts the neo-Darwinian synthesis in jeopardy is by the threatening to “uproot the Tree of Life” (TOL). Lateral gene transfer (LGT) is much more frequent than most biologists would have imagined––until the last 20 years or so; therefore, phylogenetic trees based on sequences of different prokaryotic (bacterial) genes are often different. How to tease out––from such conflicting data––something that might correspond to a single, universal TOL thus becomes problematic. Moreover, because many important evolutionary transitions involve lineage fusions at one level or another, the aptness of a tree (i.e. a pattern of successive bifurcations) as a summary of life’s history remains uncertain.

This review by Doolittle & Brunet [attached] begins with Darwin’s ‘Tree of Life’ hypothesis, then to discussion of the likelihood of the last universal common ancestor (LUCA), and then to “nearly-universal trees” (NUTs), Forest of Life (FOL), and Statistical Tree of Life (STOL).

Most gene families are not among the NUTs. Even within a designated species, a sizeable fraction of any genome participates in rapid gene loss and gain (by LGT). The pangenome concept––which aims to describe the gene repertoire of a bacterial species, by comparing gene contents of several to many of its strains––supports this notion. A typical bacterial genome comprises a Coreof genes present in all, or nearly all, of the strains of its species, plus often many more dispensable” (or accessory”) genes present in only some strains (or as few as one).

For example, with Escherichia coli (having had several thousand of its genomes sequenced), the average strain has a genome of ~5,000 genes. Genes shared between all, or almost all, strains number little more than 3,000, … but the number of gene families with a representative in at least one E. coli genome approaches 100,000. Prochlorococcus––the worlds most plentiful organism and most important oxygen-provider––shows an average genome content of only ~2,000 genes but a pangenome has been calculated to be at least 85,000.

What, then, did the genome of LUCA contain? At one extreme, one might imagine that all contemporary gene families have their coalescents (roots) or direct ancestors thereof, in the genome of LUCA. At the other extreme, one could hold that LUCA as a cell had a normalor even smallish prokaryotic genome (1,5005,000 genes) with only 100 or so genes (represented in NUTs) that might potentially have been passed down directly to all, or almost all, contemporary genomes. A third model, favored by some, is to redefine LUCA as “a community”, not a single cell or species. Although conflating having common ancestrywith having a common ancestor,this model points out the basic conflict implicit in reconstructing the TOL from sequences of genes that disagree.

But we might also ask what warrant is there, to believe in such a tree-like history of cells, coupled or not to molecular phylogenies?  Hence, even a tree of cellular lineages is not an unproblematic concept. Biology students have long accepted that incomplete lineage sorting, introgression, and full-species hybridization pose difficulties for the sorts of trees that Darwin might have had us draw. But it is the microbes––with their promiscuous willingness to exchange genes between widely separated branches of any tree ––that have most seriously jeopardized the neo-Darwinian synthesis, … which is the oversimplified form that is often presented to the public.

PLoS Genet  2o16;  12: e1005912

This entry was posted in Gene Nomenclature. Bookmark the permalink.