This should be of interest to some of you, … and perhaps ‘too heavy’ for others. There are important differences between a substantial portion of single-nucleotide variants (SNVs) in coding vs the noncoding regions (cSNV vs ncSNV) of the genome. Over the past several years, substantial effort has been put into deciding on functional annotation of SNVs and other variants in the human genome. Such annotations can have a critical role in identifying putatively causal variants for a disease or trait––among the abundant natural variation that occurs at a locus of interest.
The main challenges in using the various annotations include their large numbers and their diversity. Authors [ref below] develop an unsupervised approach to integrate these different annotations into one measure of functional importance (Eigenvalue logarithm) that, unlike most existing methods, is not based on any labeled training data. Authors show that the resulting meta-score has better discriminatory ability––using disease-associated and putatively benign variants from published studies (in both coding and noncoding regions)––than the recently proposed combined annotation-dependent depletion (CADD) score. Across varied scenarios, the Eigenvalue logarithm score performs generally better than any single individual annotation, representing a powerful single functional score that can be incorporated in fine-mapping studies.
Nat Genet Feb 2o16; 48: 214–220