Optimal sequencing depth for single-cell RNA-sequencing in Wasserstein space
Jakwang Kim, Sharvaj Kubal, Geoffrey Schiebinger

TL;DR
This paper investigates the optimal sequencing depth in single-cell RNA-sequencing by analyzing the Wasserstein distance between empirical and true cell distributions, balancing cell number and read depth.
Contribution
It provides theoretical bounds on the Wasserstein distance for non-parametric distributions, guiding sequencing depth decisions.
Findings
Derived upper and lower bounds for Wasserstein distance
Validated bounds through simulations on real data
Offers practical guidelines for sequencing depth in single-cell studies
Abstract
How many samples should one collect for an empirical distribution to be as close as possible to the true population? This question is not trivial in the context of single-cell RNA-sequencing. With limited sequencing depth, profiling more cells comes at the cost of fewer reads per cell. Therefore, one must strike a balance between the number of cells sampled and the accuracy of each measured gene expression profile. In this paper, we analyze an empirical distribution of cells and obtain upper and lower bounds on the Wasserstein distance to the true population. Our analysis holds for general, non-parametric distributions of cells, and is validated by simulation experiments on a real single-cell dataset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSingle-cell and spatial transcriptomics
