Selecting the Number of Clusters $K$ with a Stability Trade-off: an   Internal Validation Criterion

Alex Mourer; Florent Forest; Mustapha Lebbah; Hanane Azzag and; J\'er\^ome Lacaille

arXiv:2006.08530·cs.LG·May 18, 2023·1 cites

Selecting the Number of Clusters $K$ with a Stability Trade-off: an Internal Validation Criterion

Alex Mourer, Florent Forest, Mustapha Lebbah, Hanane Azzag and, J\'er\^ome Lacaille

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new internal validation criterion for selecting the optimal number of clusters by balancing between-cluster and within-cluster stability, addressing limitations of existing stability-based methods.

Contribution

It proposes a novel stability trade-off principle for cluster validation, enabling more accurate determination of the number of clusters in non-parametric clustering.

Findings

01

The new criterion effectively selects the number of clusters in various datasets.

02

It outperforms existing stability-based methods in empirical tests.

03

The approach is model-agnostic and easy to implement.

Abstract

Model selection is a major challenge in non-parametric clustering. There is no universally admitted way to evaluate clustering results for the obvious reason that no ground truth is available. The difficulty to find a universal evaluation criterion is a consequence of the ill-defined objective of clustering. In this perspective, clustering stability has emerged as a natural and model-agnostic principle: an algorithm should find stable structures in the data. If data sets are repeatedly sampled from the same underlying distribution, an algorithm should find similar partitions. However, stability alone is not well-suited to determine the number of clusters. For instance, it is unable to detect if the number of clusters is too small. We propose a new principle: a good clustering should be stable, and within each cluster, there should exist no stable partition. This principle leads to a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

FlorentF9/skstab
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Clustering Algorithms Research · Bayesian Methods and Mixture Models · Data Mining Algorithms and Applications