When Do Birds of a Feather Flock Together? k-Means, Proximity, and Conic Programming
Xiaodong Li, Yang Li, Shuyang Ling, Thomas Strohmer, and Ke Wei

TL;DR
This paper develops improved proximity conditions for convex relaxations of k-means clustering, enabling exact recovery under weaker separation assumptions and applying to various data models like Gaussian mixtures.
Contribution
It introduces a new, improved proximity condition for the Peng-Wei relaxation of k-means, addressing open problems and extending analysis to balanced clusters and data models.
Findings
Improved separation bounds for stochastic ball models.
State-of-the-art results for Gaussian mixture model learning.
Necessary and sufficient proximity conditions for exact clustering.
Abstract
Given a set of data, one central goal is to group them into clusters based on some notion of similarity between the individual objects. One of the most popular and widely-used approaches is k-means despite the computational hardness to find its global minimum. We study and compare the properties of different convex relaxations by relating them to corresponding proximity conditions, an idea originally introduced by Kumar and Kannan. Using conic duality theory, we present an improved proximity condition under which the Peng-Wei relaxation of k-means recovers the underlying clusters exactly. Our proximity condition improves upon Kumar and Kannan, and is comparable to that of Awashti and Sheffet where proximity conditions are established for projective k-means. In addition, we provide a necessary proximity condition for the exactness of the Peng-Wei relaxation. For the special case of equal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Bayesian Methods and Mixture Models · Machine Learning and Algorithms
