Perspectives on rigor and reproducibility in single-cell genomics (scRNAseq)

Single-cell genomics” is a “hot button topic” these days, and these techniques are beginning to transform biological research. One burning question — ever since this technology first appeared on the horizon is: how reproducible are their findings? This invited “Perspective” authored by Greg Gibson [see attached article] represents a great summary of where the field has been, and where it’s going.

Over the past 10 years, single-cell genomics (scRNAseq) has emerged as a powerful approach to cell biology — whose impact, relative to bulk tissue genomics, “can be likened to the transition from light to electron microscopy.” A PubMed query using the words “single cell genomics” revealed more than 5,000 papers a year since 2012 for a field that is still in its infancy. Analyses at the single-cell level can uncover properties of tissues that are barely approachable with bulk methods. Just five examples include: [a] information about variations in cell-type abundance in normal vs pathological conditions; [b] identities of the cell-types that explain disease; [c] the molecular basis of intercellular signaling; [d] the trajectories of cells as they mature or senesce; and [e] intracellular resolution of regulatory mechanisms.

None of this would be possible — without the parallel emergence of powerful pipelines and algorithms for data analysis, and advances in high-performance computing that facilitate routine handling of multi-terabyte(TB)-sized datasets. For a few years, it seemed that there were more publications presenting new methods for single-cell analysis than there were experimental findings, but remarkably the field has quickly settled around a three basic pipelines such as Seurat, Monocle, and Scanpy — which is certainly contributing to comparability and accessibility

This perspective summarizes the issues that scientists — in the field of single-cell transcriptomics (scRNAseq) — must always consider, but similar considerations apply just as well to: single-nucleus epigenomic profiling (methylation and ATACseq); spatial sequencing; single-cell proteomics; metabolomics; and undoubtedly other soon-to-emerge technologies. Integration of all of these — along with 3D-imaging, electrophysiology, axon-tracing and other methods across species — already appears to be the next phase of single-cell neurobiology. Other similar multimodal approaches are promised soon for other organs.

Demonstrating repeatability in at least one subset (as is now mandatory with genome-wide association studies (GWAS); establishing the first post-quality control step of assigning cells to clusters; evaluating the statistical procedures used for each specific study; evaluating carefully how cells are pooled from different individuals; applying a consistent algorithm to each sample (or cell) without any attempt to adjust for covariates across samples (or cells) that would potentially optimize discovery — are topics covered in the attached “Perspective.”

Widespread access to, and adoption of a handful of analytical strategies, by a generation of trainees well versed in biocomputing, is propelling the field forward at a rapid pace. On the other hand, training in the underlying statistical foundations is less available, which — combined with a culture that favors under-reporting of the effect of analytical decisions — is contributing to over-confidence. Even novice scientists are well aware of the unpredictability of quantitative findings, yet they are being asked to gloss over these in their preparation of tidy analyses that fail to address issues of reproducibility.

Thirteen years ago, Ioannidis et al., [Nat Genet 2009; 41: 149-155] published a published a sobering study of the low reproducibility of microarray-based gene expression (i.e., “RNAseq”), in which, computationally, they failed to replicate, even approximately, more than half of the key findings in 18 high-profile studies. The problem in relation to single-cell profiling is likely to be at least as challenging. It is hoped that implementation of some of the recommendations, summarized herein, in this “Perspective” will encourage more biologist-statistician collaboration, and “nudge the field toward more acceptance of the ambiguity in single-cell genomic interpretation” as a consequence of the complexity of the datasets. ☹


PLoS Genet May 2022; 18: e1010210

This entry was posted in Center for Environmental Genetics. Bookmark the permalink.