Convergence rate of stochastic k-means
Cheng Tang, Claire Monteleoni

TL;DR
This paper proves that stochastic k-means algorithms, including online and mini-batch variants, converge to local optima at an $O(1/t)$ rate, with faster convergence under clusterable data and proper initialization.
Contribution
It provides the first theoretical analysis of the convergence rate of stochastic k-means, including conditions for optimal convergence in clusterable datasets.
Findings
Both online and mini-batch k-means converge at $O(1/t)$ rate.
Mini-batch k-means converges to optimal solutions with high probability on clusterable data.
Novel geometric and non-convex analysis techniques are introduced for understanding k-means convergence.
Abstract
We analyze online and mini-batch k-means variants. Both scale up the widely used Lloyd 's algorithm via stochastic approximation, and have become popular for large-scale clustering and unsupervised feature learning. We show, for the first time, that they have global convergence towards local optima at rate under general conditions. In addition, we show if the dataset is clusterable, with suitable initialization, mini-batch k-means converges to an optimal k-means solution with convergence rate with high probability. The k-means objective is non-convex and non-differentiable: we exploit ideas from non-convex gradient-based optimization by providing a novel characterization of the trajectory of k-means algorithm on its solution space, and circumvent its non-differentiability via geometric insights about k-means update.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace and Expression Recognition · Advanced Clustering Algorithms Research · Statistical Methods and Inference
