Clustering Semi-Random Mixtures of Gaussians
Pranjal Awasthi, Aravindan Vijayaraghavan

TL;DR
This paper introduces a semi-random model for Gaussian mixture-based k-means clustering, demonstrating that Lloyd's algorithm can reliably recover true clusters despite adversarial data modifications, supported by theoretical bounds.
Contribution
It presents a semi-random model generalizing GMMs and proves Lloyd's algorithm can effectively recover clusters under this model, with matching lower bounds on misclassification.
Findings
Lloyd's algorithm achieves high-probability recovery of true clusters.
The model accounts for adversarial data modifications, enhancing robustness.
Theoretical bounds show near-optimal misclassification rates.
Abstract
Gaussian mixture models (GMM) are the most widely used statistical model for the -means clustering problem and form a popular framework for clustering in machine learning and data analysis. In this paper, we propose a natural semi-random model for -means clustering that generalizes the Gaussian mixture model, and that we believe will be useful in identifying robust algorithms. In our model, a semi-random adversary is allowed to make arbitrary "monotone" or helpful changes to the data generated from the Gaussian mixture model. Our first contribution is a polynomial time algorithm that provably recovers the ground-truth up to small classification error w.h.p., assuming certain separation between the components. Perhaps surprisingly, the algorithm we analyze is the popular Lloyd's algorithm for -means clustering that is the method-of-choice in practice. Our second result…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Advanced Clustering Algorithms Research · Data Management and Algorithms
