First there was the claim (April 2oo3) by both Craig Venter of Celera Genomics and the NIHGR-funded Human Genome Project that “the sequence of the human haploid genome had been completed”, but, in actuality, another 10-15% of (mostly) repetitive DNA had not yet been untangled. And, as additional haploid genomes were completely sequenced, and as the diploid genomes from the same individual (and several different ethnic groups), it became clear that “the human genome template” can be used against all further DNA sequencing; but it was not the final “absolute answer.” Also, it was found there existed an incredible amount of diversity –– even between the two haploid genomes from the same individual. For example, comparing ~2800 million bases (Mb) –– of the approximately 3300 million total Mb of genomic DNA that includes repetitive difficult-to-sequence regions of DNA –– more than 4 million variants between the two haploid genomes from the same individual (one diploid genome) were found a decade ago [PloS Biol Oct 2oo7; 5: e254].
It has become appreciated in recent years that genomes usually contain some non-repetitive sequences that are missing from the reference genome and occur only in a population subset. These are called “non-repetitive, non-reference (NRNR) sequences”, which have remained largely unexplored in terms of their characterization and downstream analyses. In the attached article, the deCODE Genetics group found 3,791 breakpoint-resolved NRNR sequence variants –– called PopIns –– from whole-genome sequence data of 15,219 Icelanders. They found that >95% of the 244 NRNR sequences that are 200 bp or longer are also present in chimpanzees, indicating that they are ancestral.
Moreover, 149 variant loci are in linkage disequilibrium (this is a correlation of r2 > 0.8) with a genome-wide association study (GWAS) catalog marker –– suggesting relevance to human disorder. In addition, authors report an association (P = 3.8 × 10–8, odds ratio (OR) = 0.92) with myocardial infarction (heart attack) for one particular 766-base pair (bp) NRNR sequence variant. These data underscore the importance of including variation of all complexity levels, when searching for variants that might be associated with human complex diseases..!!
Nature Genet April 2o17; 49: 588–593