Advances in genome assembly and phasing provide an opportunity to investigate the diploid architecture (i.e. taking into account that each human has pairs of chromosomes––with one allele for each gene on each of the two chromosomes in that pair) of the human genome. This approach should reveal the full range of structural variation across population groups.
Authors [attached publication] report the de novo assembly and haplotype phasing of one Korean individual AK1, using single-molecule real-time sequencing, next-generation mapping, microfluidics-based linked reads, and bacterial artificial chromosome (BAC) sequencing approaches. High concordance between the assembly and paired-end sequences from 62,758 BAC clones provided strong support for the robustness of the assembly. Authors identified 18,210 structural variants by direct comparison of the assembly with the human reference genome––identifying thousands of breakpoints that appear not to have been reported before. Many of the insertions are reflected in the transcriptome that are shared across the Asian population.
Haplotigs (i.e. haplotypes found on continuous stretches of DNA that are 17-44 million bases in length) assembled from single-molecule real-time reads that were assigned to haplotypes on phased blocks––covered ~89% of genes. The haplotigs accurately characterized the hypervariable major histocompatability complex (MHC), region as well as demonstrating allelic configuration in clinically relevant genes such as CYP2D6. [CYP2D6 is expressed variably in humans, with subsets of people having ultra-high, normal expression, intermediate expression, and poor or no expression; these differences can be important to patients receiving any of six dozen or more prescribed drugs used for analgesic, heart and psychotropic disorders.]
Authors fell that their work presents the most contiguous diploid human genome assembly so far––showing extensive investigation of unreported and Asian-specific structural variants––and high-quality haplotyping of (presumably) clinically relevant alleles for precision medicine (e.g. CYP2D6).
Nature 13 Oct 2o16; 538: 243–247