Probably certifiably correct k-means clustering
Takayuki Iguchi, Dustin G. Mixon, Jesse Peterson, Soledad Villar

TL;DR
This paper introduces a probably certifiably correct algorithm for k-means clustering, leveraging convex relaxations and dual certificates to efficiently verify optimality and recover planted clusters under the stochastic ball model.
Contribution
It develops a new dual certificate for the semidefinite relaxation of k-means and demonstrates its effectiveness in certifying solutions and recovering clusters efficiently.
Findings
Semidefinite relaxation is tight with high probability under the stochastic ball model.
The dual certificate enables quasilinear-time optimality testing of k-means solutions.
Spectral clustering can reliably recover planted clusters in the model.
Abstract
Recently, Bandeira [arXiv:1509.00824] introduced a new type of algorithm (the so-called probably certifiably correct algorithm) that combines fast solvers with the optimality certificates provided by convex relaxations. In this paper, we devise such an algorithm for the problem of k-means clustering. First, we prove that Peng and Wei's semidefinite relaxation of k-means is tight with high probability under a distribution of planted clusters called the stochastic ball model. Our proof follows from a new dual certificate for integral solutions of this semidefinite program. Next, we show how to test the optimality of a proposed k-means solution using this dual certificate in quasilinear time. Finally, we analyze a version of spectral clustering from Peng and Wei that is designed to solve k-means in the case of two clusters. In particular, we show that this quasilinear-time method typically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSpectral Clustering
