Clustering subgaussian mixtures by semidefinite programming
Dustin G. Mixon, Soledad Villar, Rachel Ward

TL;DR
This paper presents a semidefinite programming-based algorithm for clustering subgaussian mixtures, providing performance guarantees and analyzing its optimality compared to theoretical limits.
Contribution
It introduces a model-free relax-and-round algorithm for k-means clustering using SDP, with a generic proof method and analysis for subgaussian mixture models.
Findings
Algorithm interprets SDP output as denoised data
Provides performance guarantees for the clustering method
Analyzes fundamental limits of Gaussian center estimation
Abstract
We introduce a model-free relax-and-round algorithm for k-means clustering based on a semidefinite relaxation due to Peng and Wei. The algorithm interprets the SDP output as a denoised version of the original data and then rounds this output to a hard clustering. We provide a generic method for proving performance guarantees for this algorithm, and we analyze the algorithm in the context of subgaussian mixture models. We also study the fundamental limits of estimating Gaussian centers by k-means clustering in order to compare our approximation guarantee to the theoretically optimal k-means clustering solution.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Methodsk-Means Clustering
