On uniqueness of the set of k-means
Javier C\'arcamo, Antonio Cuevas, Luis A. Rodr\'iguez

TL;DR
This paper establishes necessary and sufficient conditions for the uniqueness of k-means clustering for a probability distribution, analyzing the impact of the choice of k and providing statistical tools for testing uniqueness.
Contribution
It introduces a comprehensive framework for understanding k-means uniqueness, including asymptotic analysis, statistical characterizations, and a bootstrap test for practical assessment.
Findings
Conditions for k-means uniqueness are characterized.
A bootstrap test for assessing uniqueness is developed.
Simulation results demonstrate the effectiveness of the proposed methodology.
Abstract
We provide necessary and sufficient conditions for the uniqueness of the k-means set of a probability distribution. This uniqueness problem is related to the choice of k: depending on the underlying distribution, some values of this parameter could lead to multiple sets of k-means, which hampers the interpretation of the results and/or the stability of the algorithms. We give a general assessment on consistency of the empirical k-means adapted to the setting of non-uniqueness and determine the asymptotic distribution of the within cluster sum of squares (WCSS). We also provide statistical characterizations of k-means uniqueness in terms of the asymptotic behavior of the empirical WCSS. As a consequence, we derive a bootstrap test for uniqueness of the set of k-means. The results are illustrated with examples of different types of non-uniqueness and we check by simulations the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace and Expression Recognition
MethodsSparse Evolutionary Training
