Selecting the Number of Clusters $K$ with a Stability Trade-off: an Internal Validation Criterion
Alex Mourer, Florent Forest, Mustapha Lebbah, Hanane Azzag and, J\'er\^ome Lacaille

TL;DR
This paper introduces a new internal validation criterion for selecting the optimal number of clusters by balancing between-cluster and within-cluster stability, addressing limitations of existing stability-based methods.
Contribution
It proposes a novel stability trade-off principle for cluster validation, enabling more accurate determination of the number of clusters in non-parametric clustering.
Findings
The new criterion effectively selects the number of clusters in various datasets.
It outperforms existing stability-based methods in empirical tests.
The approach is model-agnostic and easy to implement.
Abstract
Model selection is a major challenge in non-parametric clustering. There is no universally admitted way to evaluate clustering results for the obvious reason that no ground truth is available. The difficulty to find a universal evaluation criterion is a consequence of the ill-defined objective of clustering. In this perspective, clustering stability has emerged as a natural and model-agnostic principle: an algorithm should find stable structures in the data. If data sets are repeatedly sampled from the same underlying distribution, an algorithm should find similar partitions. However, stability alone is not well-suited to determine the number of clusters. For instance, it is unable to detect if the number of clusters is too small. We propose a new principle: a good clustering should be stable, and within each cluster, there should exist no stable partition. This principle leads to a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Bayesian Methods and Mixture Models · Data Mining Algorithms and Applications
