On the tightness of an SDP relaxation of k-means
Takayuki Iguchi, Dustin G. Mixon, Jesse Peterson, Soledad Villar

TL;DR
This paper analyzes the conditions under which a semidefinite programming relaxation accurately recovers clusters in a probabilistic model of data points distributed around fixed centers, showing high-probability exact recovery when centers are sufficiently separated.
Contribution
It provides a theoretical analysis of the tightness of an SDP relaxation for k-means in a random data model, establishing explicit separation conditions for exact recovery.
Findings
Exact recovery with high probability when centers are separated by more than 2 + epsilon.
Recovery probability approaches 1 exponentially fast in the number of points n.
Separation epsilon can be made arbitrarily small in high-dimensional settings.
Abstract
Recently, Awasthi et al. introduced an SDP relaxation of the -means problem in . In this work, we consider a random model for the data points in which balls of unit radius are deterministically distributed throughout , and then in each ball, points are drawn according to a common rotationally invariant probability distribution. For any fixed ball configuration and probability distribution, we prove that the SDP relaxation of the -means problem exactly recovers these planted clusters with probability provided the distance between any two of the ball centers is , where is an explicit function of the configuration of the ball centers, and can be arbitrarily small when is large.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Complexity and Algorithms in Graphs · Stochastic Gradient Optimization Techniques
