Statistical and Computational Guarantees of Lloyd's Algorithm and its Variants
Yu Lu, Harrison H. Zhou

TL;DR
This paper provides the first comprehensive theoretical analysis of Lloyd's algorithm, demonstrating its exponential convergence and optimal error rates for clustering sub-Gaussian mixtures, and extends its application to community detection and crowdsourcing.
Contribution
It offers the first theoretical guarantees for Lloyd's algorithm's performance, including convergence and error bounds, and introduces variants for community detection and crowdsourcing with improved guarantees.
Findings
Lloyd's algorithm converges exponentially fast with minimax optimal error rates.
For two-mixture clustering, initialization slightly better than random suffices.
Proposed variants outperform previous methods in community detection and crowdsourcing.
Abstract
Clustering is a fundamental problem in statistics and machine learning. Lloyd's algorithm, proposed in 1957, is still possibly the most widely used clustering algorithm in practice due to its simplicity and empirical performance. However, there has been little theoretical investigation on the statistical and computational guarantees of Lloyd's algorithm. This paper is an attempt to bridge this gap between practice and theory. We investigate the performance of Lloyd's algorithm on clustering sub-Gaussian mixtures. Under an appropriate initialization for labels or centers, we show that Lloyd's algorithm converges to an exponentially small clustering error after an order of iterations, where is the sample size. The error rate is shown to be minimax optimal. For the two-mixture case, we only require the initializer to be slightly better than random guess. In addition, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Mechanics and Entropy · Advanced Mathematical Theories and Applications · Statistical and numerical algorithms
