Early on, in genetic studies, the concept (learned in grade school, these days 😊) was that “DNA is transcribed into RNA, which is translated into protein.” Then, it became established that “DNA —> primary transcript, which then results in messenger-RNA (mRNA) via splicing; then mRNA —> protein. The early belief was the “one gene = one gene product (protein)” concept. Now, we know the primary transcript can exhibit dozens of start-sites and termination sites, leading to numerous mRNA variants often resulting in different proteins (many revealing distinct functions); also, by way of posttranslational modification, a protein can be modified via addition of various groups (e.g. glycosylation, phosphorylation, ubiquitination, acetylation, methylation, ADP-ribosylation, farnesylation, sumoylation, glutathionylation), resulting in dozens or hundreds of final protein products — from a single gene…!! The total number of proteins from any organism is called “the proteome,” which is obviously more complex and much larger than “the genome”.
Authors [see attached article] chose to begin characterizing proteomes from a diverse set of representative organisms across the Tree of Life. Including common model organisms for comparison, this collation resulted in 19 archaeabacteria, 49 eubacteria, and 32 eukaryotes [bacteria (prokaryotes) have single chromosomes; eukaryotes chromosome pairs] — creating a total of 100 different species. Authors also added 14 viruses. Authors incorporated the latest technological advances into their workflow for high-resolution bottom-up proteomics, and implemented a recently developed chip-based method. For all prokaryotes, authors performed single-run mass spectrometry (MS) analyses, whereas for the more complex eukaryotic samples, authors used a loss-less prefractionator. They reasoned the chip-based chromatographic method — combined with the very large data set of more than two million unique peptides — should be well suited to “deep-learning algorithms” (which have recently been shown to be applicable to MS-based proteomics).
With ~2 million peptide (following protein digestion, smaller pieces of amino-acid sequences are called ‘peptides’) and 340,000 stringent protein identifications, obtained in a standardized manner, authors doubled the number of proteins — with solid
experimental evidence known to the scientific community. These data [see attached article] provide a large-scale case study for sequence-based machine learning — as authors demonstrate, by experimentally confirming the predicted properties of peptides from a simple bacterium, Bacteroides uniformis. These results offer a comparative view of the functional organization of organisms — across the entire evolutionary range.
A remarkably high fraction of the total proteome mass in all kingdoms was found to be dedicated to protein homeostasis and folding; this highlights the biological challenge of maintaining protein structure in all branches of Life. Likewise, a universally high fraction is involved in supplying energy resources — although these pathways range from photosynthesis through iron-sulfur metabolism to carbohydrate metabolism. Generally, however, proteins and proteomes are remarkably diverse between organisms. They can readily be explored and functionally compared at www.proteomesoflife.org.
Nature 25 Jun 2020; 583: 592-596