Selective inference for multiple pairs of clusters after K-means clustering
Youngjoo Yun, Yinqiu He

TL;DR
This paper develops methods for selective inference on multiple cluster pairs after K-means clustering, controlling Type I error in testing differences between clusters, even with unknown variance and data-dependent pair selection.
Contribution
It extends existing pairwise cluster testing to multiple pairs, incorporating unknown variance and data-dependent pair selection, with theoretical and empirical error control.
Findings
Proposed tests control Type I error both theoretically and empirically.
Methods are effective for multiple cluster pair testing after K-means.
Numerical studies demonstrate good empirical power under various scenarios.
Abstract
If the same data is used for both clustering and for testing a null hypothesis that is formulated in terms of the estimated clusters, then the traditional hypothesis testing framework often fails to control the Type I error. Gao et al. [2022] and Chen and Witten [2023] provide selective inference frameworks for testing if a pair of estimated clusters indeed stem from underlying differences, for the case where hierarchical clustering and K-means clustering, respectively, are used to define the clusters. In applications, however, it is often of interest to test for multiple pairs of clusters. In our work, we extend the pairwise test of Chen and Witten [2023] to a test for multiple pairs of clusters, where the cluster assignments are produced by K-means clustering. We further develop an analogous test for the setting where the variance is unknown, building on the work of Yun and Barber…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace and Expression Recognition · Bayesian Methods and Mixture Models · Advanced Clustering Algorithms Research
