Practical limitations for real-life application of data fission and data thinning in post-clustering differential analysis
Benjamin Hivert, Denis Agniel, Rodolphe Thi\'ebaut, Boris P. Hejblum

TL;DR
This paper critically examines the practical limitations of data fission and data thinning methods in post-clustering differential analysis of single-cell RNA sequencing, highlighting their reliance on strong assumptions and the challenges in real-world applications.
Contribution
The study introduces conditional data fission for mixture decomposition and demonstrates the necessity of prior knowledge of clustering structure for valid inference, revealing fundamental limitations.
Findings
Data fission requires known clustering structure for validity.
Biases in parameter estimation inflate Type I error rates.
Applying data fission in practice is fundamentally challenging.
Abstract
Post-clustering inference in single-cell RNA sequencing (scRNA-seq) analysis presents significant challenges in controlling Type I error during differential expression analysis. Data fission, a promising approach that aims to split data into two independent parts, relies on strong parametric assumptions of non-mixture distributions that are inherently violated in clustered data. To address this limitation, we introduce conditional data fission, an extension designed to decompose each mixture component into two independent parts. However, we demonstrate that applying such conditional data fission to mixture distributions requires prior knowledge of the clustering structure to ensure valid post-clustering inference. This arises from the need to accurately estimate component-specific scale parameters, which are critical for performing decomposition while maintaining independence. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSingle-cell and spatial transcriptomics · Statistical Methods and Inference · Gene expression and cancer classification
