Consistency and Inconsistency in $K$-Means Clustering
Mo\"ise Blanchard, Adam Quinn Jaffe, Nikita Zhivotovskiy

TL;DR
This paper examines the conditions under which $k$-means clustering is consistent when only finite expectation is assumed, revealing subtle inconsistencies caused by outliers and imbalance, and proposing conditions for recovering consistency.
Contribution
It extends the understanding of $k$-means consistency to weaker assumptions and identifies key factors affecting convergence, such as cluster balance and outliers.
Findings
Empirical cluster centers may fail to converge despite unique population centers.
Inconsistency is caused by outliers leading to imbalanced clusters.
Imposing balance conditions can restore asymptotic consistency.
Abstract
A celebrated result of Pollard proves asymptotic consistency for -means clustering when the population distribution has finite variance. In this work, we point out that the population-level -means clustering problem is, in fact, well-posed under the weaker assumption of a finite expectation, and we investigate whether some form of asymptotic consistency holds in this setting. As we illustrate in a variety of negative results, the complete story is quite subtle; for example, the empirical -means cluster centers may fail to converge even if there exists a unique set of population -means cluster centers. A detailed analysis of our negative results reveals that inconsistency arises because of an extreme form of cluster imbalance, whereby the presence of outlying samples leads to some empirical -means clusters possessing very few points. We then give a collection of positive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Facility Location and Emergency Management · Bayesian Methods and Mixture Models
MethodsSparse Evolutionary Training
