Testing for a difference in means of a single feature after clustering

Yiqun T. Chen; Lucy L. Gao

arXiv:2311.16375·stat.ME·November 29, 2023·1 cites

Testing for a difference in means of a single feature after clustering

Yiqun T. Chen, Lucy L. Gao

PDF

Open Access 2 Repos

TL;DR

This paper introduces a new statistical test for comparing feature means between clusters, controlling Type I error in clustering validation, with demonstrated effectiveness on simulated and biological data.

Contribution

The authors develop a finite-sample valid test for mean differences post-clustering that accounts for the clustering process, improving validation accuracy.

Findings

01

The test controls the Type I error rate in finite samples.

02

Simulation studies show the test has good power.

03

Application to single-cell RNA-sequencing data demonstrates practical utility.

Abstract

For many applications, it is critical to interpret and validate groups of observations obtained via clustering. A common validation approach involves testing differences in feature means between observations in two estimated clusters. In this setting, classical hypothesis tests lead to an inflated Type I error rate. To overcome this problem, we propose a new test for the difference in means in a single feature between a pair of clusters obtained using hierarchical or $k$ -means clustering. The test based on the proposed $p$ -value controls the selective Type I error rate in finite samples and can be efficiently computed. We further illustrate the validity and power of our proposal in simulation and demonstrate its use on single-cell RNA-sequencing data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSingle-cell and spatial transcriptomics · Gene expression and cancer classification · Gene Regulatory Network Analysis