“Completing” the human genome (this time, for real ??)

From: Nebert, Daniel (nebertdw)
Sent: den 9 maj 2022 00:25
Subject: “Completing” the human genome (this time, for real ??) #2

The attached articles accompany the “human genome sequence story” — sent to everyone within the past hour. From left to right: [a] Identification of segmental duplications; [b] Genomic and epigenetic maps of centromeres; [c] The transcriptional and epigenetic state of repeat elements; and [d] Epigenetic patterns throughout the entire completed human genome. 😊DwN

Science, 1 Apr 2022; 376: 55, 56, 57 and 58

Since its initial release in April of 2000, the human reference genome had covered only the euchromatic fraction of the genome [euchromatin is the lightly packed form of chromatin (DNA, RNA, and protein) that is enriched in genes, and is often (but not always) under active transcription].

This leaves the important heterochromatic regions (tightly packed form of chromatin) unfinished. Completing the remaining 8% of the genome [see attached article], the Telomere-to-Telomere (T2T) Consortium now presents the complete 3.055 billion–base pair sequence of a human genome (T2T-CHM13) which: [a] includes gapless assemblies for all chromosomes except Y, [b] corrects errors in the prior references, and [c] introduces nearly 200 million new base pairs (bp) of sequence — containing 1,956 new gene predictions, 99 of which are predicted to be protein-coding.

The completed regions [see attached] include all centromeric satellite arrays, recent segmental duplications, and the short arms of all five acrocentric (i.e., the centromere is situated so that one chromosomal arm is much shorter than the other arm) chromosomes, unlocking these complex regions of the genome — so that variational and functional studies can now be carried out.

The current human reference genome was released by the Genome Reference Consortium (GRC) in 2013, and most recently patched in 2019 (GRCh38.p13). This reference traces its origin to the publicly-funded Human Genome Project and has been continually improved over the past two decades. Unlike the competing Celera company effort, and most modern sequencing projects based on “shotgun” sequence assembly, the GRC assembly was constructed from sequenced bacterial artificial chromosomes (BACs) that were ordered and oriented along the human genome by means of radiation hybrid, genetic linkage, and fingerprint maps.

However, limitations of BAC cloning have led to an underrepresentation of repetitive sequences, and the opportunistic assembly of BACs derived from multiple individuals resulted in a mosaic of haplotypes. As a result, several GRC assembly gaps are unsolvable — because of incompatible structural polymorphisms on their flanks, and many other repetitive and polymorphic regions were left unfinished, or incorrectly assembled.

To finish the last remaining regions of the genome, authors leveraged the complementary aspects of PacBio HiFi and Oxford Nanopore ultralong-read sequencing to assemble the uniformly homozygous CHM13hTERT cell line (hereafter, CHM13). The resulting T2T-CHM13 reference assembly removes a 20-year-old barrier that had hidden 8% of the genome from sequence-based analysis — including all centromeric regions and the entire short arms of five human acrocentric chromosomes. Authors describe [see attached] the construction, validation, and initial analysis of a truly complete human reference genome and discuss its potential impact on the field. 😊

COMMENT: The BIG question, now, becomes — “Because we have knowledge of ‘this last 8% of the genome’ that has been unavailable until now, do all important GWAS studies need to be repeated?”DwN
COMMENT: Great, many thanks! MI-S
ADDED:
Since its initial release in April of 2000, the human reference genome had covered only the euchromatic fraction of the genome [euchromatin is the lightly packed form of chromatin (DNA, RNA, and protein) that is enriched in genes, and is often (but not always) under active transcription].
This leaves the important heterochromatic regions (tightly packed form of chromatin) unfinished. Completing the remaining 8% of the genome [see attached article], the Telomere-to-Telomere (T2T) Consortium now presents the complete 3.055 billion–base pair sequence of a human genome (T2T-CHM13) which: [a] includes gapless assemblies for all chromosomes except Y, [b] corrects errors in the prior references, and [c] introduces nearly 200 million new base pairs (bp) of sequence — containing 1,956 new gene predictions, 99 of which are predicted to be protein-coding.
The completed regions [see attached] include all centromeric satellite arrays, recent segmental duplications, and the short arms of all five acrocentric (i.e., the centromere is situated so that one chromosomal arm is much shorter than the other arm) chromosomes, unlocking these complex regions of the genome — so that variational and functional studies can now be carried out.

The current human reference genome was released by the Genome Reference Consortium (GRC) in 2013, and most recently patched in 2019 (GRCh38.p13). This reference traces its origin to the publicly-funded Human Genome Project and has been continually improved over the past two decades. Unlike the competing Celera company effort, and most modern sequencing projects based on “shotgun” sequence assembly, the GRC assembly was constructed from sequenced bacterial artificial chromosomes (BACs) that were ordered and oriented along the human genome by means of radiation hybrid, genetic linkage, and fingerprint maps.

However, limitations of BAC cloning have led to an underrepresentation of repetitive sequences, and the opportunistic assembly of BACs derived from multiple individuals resulted in a mosaic of haplotypes. As a result, several GRC assembly gaps are unsolvable — because of incompatible structural polymorphisms on their flanks, and many other repetitive and polymorphic regions were left unfinished, or incorrectly assembled.

To finish the last remaining regions of the genome, authors leveraged the complementary aspects of PacBio HiFi and Oxford Nanopore ultralong-read sequencing to assemble the uniformly homozygous CHM13hTERT cell line (hereafter, CHM13). The resulting T2T-CHM13 reference assembly removes a 20-year-old barrier that had hidden 8% of the genome from sequence-based analysis — including all centromeric regions and the entire short arms of five human acrocentric chromosomes. Authors describe [see attached] the construction, validation, and initial analysis of a truly complete human reference genome and discuss its potential impact on the field. 😊
Science, 1 Apr 2022; 376: 44-53

ADDED: The attached articles accompany the “human genome sequence story.” From left to right: [a] Identification of segmental duplications; [b] Genomic and epigenetic maps of centromeres; [c] The transcriptional and epigenetic state of repeat elements; and [d] Epigenetic patterns throughout the entire completed human genome. 😊

DwN

Science, 1 Apr 2022; 376: 55, 56, 57 and 58

This entry was posted in Center for Environmental Genetics. Bookmark the permalink.