Issues arising from benchmarking single-cell RNA sequencing imputation methods
Wei Vivian Li, Jingyi Jessica Li

TL;DR
This paper critically re-evaluates benchmarking of scRNA-seq imputation methods, revealing that previous analyses based on semi-synthetic data may not accurately reflect real data performance, emphasizing the need for biologically grounded evaluation.
Contribution
It highlights the limitations of semi-synthetic benchmarking approaches and demonstrates the importance of using real data and biological context for evaluating imputation methods.
Findings
Semi-synthetic data differ significantly from real scRNA-seq data.
Cell clusters in benchmarks are inconsistent with known biological cell types.
Reanalysis with real data yields different conclusions from previous studies.
Abstract
On June 25th, 2018, Huang et al. published a computational method SAVER on Nature Methods for imputing dropout gene expression levels in single cell RNA sequencing (scRNA-seq) data. Huang et al. performed a set of comprehensive benchmarking analyses, including comparison with the data from RNA fluorescence in situ hybridization, to demonstrate that SAVER outperformed two existing scRNA-seq imputation methods, scImpute and MAGIC. However, their computational analyses were based on semi-synthetic data that the authors had generated following the Poisson-Gamma model used in the SAVER method. We have therefore re-examined Huang et al.'s study. We find that the semi-synthetic data have very different properties from those of real scRNA-seq data and that the cell clusters used for benchmarking are inconsistent with the cell types labeled by biologists. We show that a reanalysis based on real…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSingle-cell and spatial transcriptomics · Gene expression and cancer classification · Cancer Genomics and Diagnostics
