Distribution free optimality intervals for clustering

Marina Meil\u{a}; Hanyu Zhang

arXiv:2107.14442·stat.ML·February 2, 2023

Distribution free optimality intervals for clustering

Marina Meil\u{a}, Hanyu Zhang

PDF

Open Access

TL;DR

This paper introduces a distribution-free method to validate clustering results by providing guarantees on their optimality and stability, applicable to various loss functions without relying on distributional assumptions.

Contribution

It presents a generic convex optimization-based approach to obtain post-inference guarantees for clustering quality and stability across different criteria.

Findings

01

Guarantees for K-means and Normalized Cut clustering on real datasets.

02

Asymptotic instability implies finite sample instability with high probability.

03

Method does not depend on distributional assumptions, only on data stability.

Abstract

We address the problem of validating the ouput of clustering algorithms. Given data $D$ and a partition $C$ of these data into $K$ clusters, when can we say that the clusters obtained are correct or meaningful for the data? This paper introduces a paradigm in which a clustering $C$ is considered meaningful if it is good with respect to a loss function such as the K-means distortion, and stable, i.e. the only good clustering up to small perturbations. Furthermore, we present a generic method to obtain post-inference guarantees of near-optimality and stability for a clustering $C$ . The method can be instantiated for a variety of clustering criteria (also called loss functions) for which convex relaxations exist. Obtaining the guarantees amounts to solving a convex optimization problem. We demonstrate the practical relevance of this method by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Statistical Methods and Inference · Bayesian Methods and Mixture Models