Consistency of Lloyd's Algorithm Under Perturbations
Dhruv Patel, Hui Shen, Shankar Bhamidi, Yufeng Liu, Vladas Pipiras

TL;DR
This paper proves that Lloyd's algorithm maintains an exponentially bounded mis-clustering rate under small perturbations, extending previous results to more realistic data scenarios involving pre-processing steps.
Contribution
It demonstrates that Lloyd's algorithm remains effective under data perturbations when combined with proper initialization, with implications for various clustering applications.
Findings
Mis-clustering rate remains exponentially bounded under small perturbations.
Proper initialization ensures the correctness of Lloyd's algorithm in perturbed settings.
Results apply to high-dimensional data, time series, and network community detection.
Abstract
In the context of unsupervised learning, Lloyd's algorithm is one of the most widely used clustering algorithms. It has inspired a plethora of work investigating the correctness of the algorithm under various settings with ground truth clusters. In particular, in 2016, Lu and Zhou have shown that the mis-clustering rate of Lloyd's algorithm on independent samples from a sub-Gaussian mixture is exponentially bounded after iterations, assuming proper initialization of the algorithm. However, in many applications, the true samples are unobserved and need to be learned from the data via pre-processing pipelines such as spectral methods on appropriate data matrices. We show that the mis-clustering rate of Lloyd's algorithm on perturbed samples from a sub-Gaussian mixture is also exponentially bounded after iterations under the assumptions of proper…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
