Asymptotic Theory of Geometric and Adaptive $k$-Means Clustering
Adam Quinn Jaffe

TL;DR
This paper extends the asymptotic theory of $k$-means clustering to geometric and adaptive settings, demonstrating strong consistency and deriving new limit theorems without requiring uniqueness of optimal centers.
Contribution
It provides a unified framework for analyzing $k$-means and related clustering methods in complex geometric spaces and with data-driven parameter selection, broadening theoretical understanding.
Findings
All considered clustering procedures are strongly consistent.
The theory applies to data on Riemannian manifolds, Banach spaces, and Wasserstein space.
Many asymptotic limit theorems are derived beyond strong consistency.
Abstract
We revisit Pollard's classical result on consistency for -means clustering in Euclidean space, with a focus on extensions in two directions: first, to problems where the data may come from interesting geometric settings (e.g., Riemannian manifolds, reflexive Banach spaces, or the Wasserstein space); second, to problems where some parameters are chosen adaptively from the data (e.g., -medoids or elbow-method -means). Towards this end, we provide a general theory which shows that all clustering procedures described above are strongly consistent. In fact, our method of proof allows us to derive many asymptotic limit theorems beyond strong consistency. We also remove all assumptions about uniqueness of the set of optimal cluster centers.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Statistical Methods and Inference · Topological and Geometric Data Analysis
