Architecture of the human interactome defines protein “communities” and disease networks

In the 1960s, things were simple. Genes were made from DNA, which was transcribed into RNA, which was then translated in the cytoplasm (in ribosomes) into proteins (i.e. the gene products). Then people realized that transcribed RNA includes both exons and “intervening sequences” (now termed ‘introns’) and, after splicing out the introns, the open reading frames (ORFs) of (exonic) messenger-RNA (mRNA), are translated into proteins. And mRNA contains both 5′ and 3′ untranslated regions (UTRs). And “one-gene, one-enzyme” was found to be too simplistic; one mRNA could lead to dozens of different proteins, and posttranslational modifications also result in innumerable different end-products.

Until the end of the 1980s, it was generally accepted there’d be one mutant gene (for diseases) and the wild-type gene (the ‘healthy’ gene, or ‘consensus’ gene). Then it was learned there could be hundreds of mutant alleles –– all affecting a gene’s expression to various degrees (multiple alleles at one genetic locus = ‘polyallelic’). Then we realized there many synonymous mutations (not changing translated amino acid) and many nonsynonymous mutations (changing the translated amino acid). Then it was discovered that DNA sequence is not the only means for changing a gene’s expression, i.e. epigenetic changes (DNA methylation, RNA-interference, histone modifications, and chromatin remodeling) and environmental effects and transgenerational effects –– all could affect gene expression. In humans, there are about 200 different types of cells, and within these cells there are about 20 different types of structures, or “organelles”; this means that each of us has about 200 epigenomes, all presumably unique from one another.

Thus, most phenotypes (height, weight, serum levels of whatever, disease, drug efficacy, toxicity) are multifactorial traits –– meaning that they reflect the contribution of DNA sequence (genetics) of hundreds or perhaps thousands of gene, plus epigenetic effects, environmental adversity, and transgenerational effects. All of this together is called the “genomic architecture” of an individual. The next level is “genetic networks,” i.e. various genes being turned on, or off, as a function of time –– in a cascade of events. This is called the “interactome,” which can differ between cells of the same type, as well as cells of different types.

The physiology of a cell can be viewed as the product of thousands of proteins acting in concert to shape the cellular response. The physiology of the individual reflects the product of all these inherited traits acting in concert to shape the organism’s response (in the case of humans, we have between 30 and 40 trillion cells). Coordination is achieved, in part, through networks of protein–protein interactions that assemble functionally related proteins into complexes, organelles, and signal transduction pathways. Understanding the architecture of the human proteome has the potential to inform the scientist about cellular, structural, and evolutionary mechanisms, and this is critical to elucidating how genome variation contributes to disease.

In the attached article, authors present BioPlex 2.0 (Biophysical Interactions of ORFeome-derived complexes) –– which uses robust affinity purification–mass spectrometry methodology to elucidate protein-interaction networks and co-complexes nucleated by more than 25% of protein-coding genes from the human genome. BioPlex 2.0 constitutes probably the largest such network so far. With more than 56,000 candidate interactions, BioPlex 2.0 contains more than 29,000 previously unknown co-associations and provides functional insights into hundreds of poorly characterized proteins –– while enhancing network-based analyses of domain associations, subcellular localization, and co-complex formation.

Unsupervised Markov clustering of interacting proteins was used to identify more than 1,300 protein “communities” representing diverse cellular activities. Genes essential for cell fitness are enriched within 53 “communities” representing central cellular functions. Moreover, authors identified 442 “communities” associated with more than 2,000 disease annotations, placing numerous candidate disease genes into a cellular framework. This landmark paper shows that BioPlex 2.0 exceeds previous experimentally-derived interaction-networks in depth and breadth, and will be a valuable resource –– for exploring biology of incompletely characterized proteins, and for elucidating larger-scale patterns of proteome organization.

Nature 25 May 2017; 545: 505–509

This entry was posted in Center for Environmental Genetics. Bookmark the permalink.