“DNA.Land” — a framework to collect genomes and phenomes in the present era of abundant genetic information

With publications of genome-wide association studies (GWAS) comprising increasingly larger cohorts, it has become clear that virtually all multifactorial traits (e.g. height, body mass index, type-2 diabetes, asthma, cancer, autism spectrum disorder) must reflect hundreds if not thousands of genes, plus epigenetic factors, plus environmental effects over time. And, most genes are “small-effect” contributors to the multifactorial trait being studied –– each gene contributing to the trait (phenotype) 0.1% or 0.0001%. This means the larger the population size, the more small-effect genes that can be identified.

Explaining the genetic basis of complex traits requires substantial quantities of genomic data –– which has been greatly helps by exponential decline in cost of genomic technologies. Currently, a genotyping array costs on the order of tens of dollars, and whole-genome sequencing (WGS) costs ~$1,000. However, collecting combined genetic data AND phenotype data is a time- and resource-consuming task that poses massive logistical and operational challenges. On top of the costs of genotyping, researchers first need to advertise the study, recruit participants, obtain consent, provide DNA collection kits, track and store samples, extract DNA, and prepare the DNA library –– before data can be available in a digital format.

Phenotyping requires further resources, even when done using online questionnaires. These operations are labor-intensive and incur massive costs. For example, the U.S. National Institutes of Health’s Precision Medicine Initiative (“All of Us”) has recently allocated $50 million for recruitment centers and biobank operations that collectively propose to recruit and handle biospecimens and basic phenotypic information from a total of ~500,000 participants. The past 5 years have witnessed the advent of large-scale direct-to-consumer (DTC) genetic services for genealogy –– with companies such as 23andMe, AncestryDNA, FamilyTreeDNA, and MyHeritage. These services provide a dense genotyping array with ~0.5 million single-nucleotide variants (SNVs) for ~$69–$99 per participant. As of today, more than 8 million individuals have been tested with these services, and >10,000 new DTC kits are purchased daily..!!

Building upon these observations, authors [see attached article] developed DNA.Land, a website to crowdsource genomic and phenotypic information for human genetics research. DNA.Land has two overall goals: (a) to demonstrate the potential for genotype and phenotype collection by crowdsourcing data from users of DTC companies and (b) to promote the idea of patient-led genetic research, with controls left to the participants (for example, the consumer has the choice of the degree to which they approve the sharing of phenotype data, and the possibilities needed for providing feedback to researchers). In 20 months of operation, DNA.Land has collected more than 50,000 genomic datasets from DTC participants, and the datasets are growing daily.

Nature Genet Feb 2o18; 50: 160–165

This entry was posted in Center for Environmental Genetics. Bookmark the permalink.