Selective inference for clustering with unknown variance

Youngjoo Yun; Rina Foygel Barber

arXiv:2301.12999·stat.ME·July 24, 2023

Selective inference for clustering with unknown variance

Youngjoo Yun, Rina Foygel Barber

PDF

Open Access 1 Repo

TL;DR

This paper develops a selective inference method for clustering that accounts for unknown variance, enabling valid hypothesis testing on data-dependent clusters while controlling false discoveries.

Contribution

It extends existing selective inference frameworks to handle unknown noise variance in clustering, improving validity and power.

Findings

01

Method maintains high power with unknown variance.

02

It effectively controls Type I error in practical scenarios.

03

Outperforms previous methods assuming known variance.

Abstract

In many modern statistical problems, the limited available data must be used both to develop the hypotheses to test, and to test these hypotheses-that is, both for exploratory and confirmatory data analysis. Reusing the same dataset for both exploration and testing can lead to massive selection bias, leading to many false discoveries. Selective inference is a framework that allows for performing valid inference even when the same data is reused for exploration and testing. In this work, we are interested in the problem of selective inference for data clustering, where a clustering procedure is used to hypothesize a separation of the data points into a collection of subgroups, and we then wish to test whether these data-dependent clusters in fact represent meaningful differences within the data. Recent work by Gao et al. [2022] provides a framework for doing selective inference for this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yjyun97/cluster_inf_unknown_var
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Clustering Algorithms Research · Bayesian Methods and Mixture Models · Anomaly Detection Techniques and Applications