The elbow statistic: Multiscale clustering statistical significance

Francisco J. Perez-Reche

arXiv:2603.03235·stat.ML·May 5, 2026

The elbow statistic: Multiscale clustering statistical significance

Francisco J. Perez-Reche

PDF

TL;DR

ElbowSig is a statistical framework for assessing the significance of clustering structures across multiple resolutions, enabling multiscale inference and improving detection of meaningful data organization.

Contribution

It formalizes the elbow heuristic into a hypothesis testing framework applicable to various clustering methods, addressing multiscale clustering significance.

Findings

01

Controls Type-I error on unstructured data

02

Detects multiscale data organization effectively

03

Compatible with diverse clustering algorithms

Abstract

Selecting the number of clusters remains a fundamental challenge in unsupervised learning. Existing approaches typically focus on identifying a single "optimal" partition, often overlooking statistically meaningful structure present across multiple resolutions. We introduce ElbowSig, a general inferential framework for assessing clustering structure over a range of resolutions. The method formalizes the elbow heuristic by defining a normalized discrete curvature statistic based on the sequence of within-cluster heterogeneity values, and evaluates its significance relative to a null distribution of unstructured data. This yields hypothesis tests across resolutions, enabling simultaneous inference at multiple clustering scales. We derive the asymptotic behavior of the null statistic in both large-sample and high-dimensional regimes, characterizing its limiting form and variability.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.