Testing for a difference in means of a single feature after clustering
Yiqun T. Chen, Lucy L. Gao

TL;DR
This paper introduces a new statistical test for comparing feature means between clusters, controlling Type I error in clustering validation, with demonstrated effectiveness on simulated and biological data.
Contribution
The authors develop a finite-sample valid test for mean differences post-clustering that accounts for the clustering process, improving validation accuracy.
Findings
The test controls the Type I error rate in finite samples.
Simulation studies show the test has good power.
Application to single-cell RNA-sequencing data demonstrates practical utility.
Abstract
For many applications, it is critical to interpret and validate groups of observations obtained via clustering. A common validation approach involves testing differences in feature means between observations in two estimated clusters. In this setting, classical hypothesis tests lead to an inflated Type I error rate. To overcome this problem, we propose a new test for the difference in means in a single feature between a pair of clusters obtained using hierarchical or -means clustering. The test based on the proposed -value controls the selective Type I error rate in finite samples and can be efficiently computed. We further illustrate the validity and power of our proposal in simulation and demonstrate its use on single-cell RNA-sequencing data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSingle-cell and spatial transcriptomics · Gene expression and cancer classification · Gene Regulatory Network Analysis
