Clustering subgaussian mixtures by semidefinite programming

Dustin G. Mixon; Soledad Villar; Rachel Ward

arXiv:1602.06612·stat.ML·May 11, 2016

Clustering subgaussian mixtures by semidefinite programming

Dustin G. Mixon, Soledad Villar, Rachel Ward

PDF

TL;DR

This paper presents a semidefinite programming-based algorithm for clustering subgaussian mixtures, providing performance guarantees and analyzing its optimality compared to theoretical limits.

Contribution

It introduces a model-free relax-and-round algorithm for k-means clustering using SDP, with a generic proof method and analysis for subgaussian mixture models.

Findings

01

Algorithm interprets SDP output as denoised data

02

Provides performance guarantees for the clustering method

03

Analyzes fundamental limits of Gaussian center estimation

Abstract

We introduce a model-free relax-and-round algorithm for k-means clustering based on a semidefinite relaxation due to Peng and Wei. The algorithm interprets the SDP output as a denoised version of the original data and then rounds this output to a hard clustering. We provide a generic method for proving performance guarantees for this algorithm, and we analyze the algorithm in the context of subgaussian mixture models. We also study the fundamental limits of estimating Gaussian centers by k-means clustering in order to compare our approximation guarantee to the theoretically optimal k-means clustering solution.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Methodsk-Means Clustering