Clustering Mixtures with Almost Optimal Separation in Polynomial Time
Jerry Li, Allen Liu

TL;DR
This paper introduces a polynomial-time algorithm for clustering high-dimensional Gaussian mixtures with near-optimal separation, significantly improving efficiency over previous methods that required larger separation or quasipolynomial time.
Contribution
The authors develop the first polynomial-time algorithm that nearly matches the information-theoretic separation threshold for clustering Gaussian mixtures.
Findings
Successfully recovers clusters with separation b4 = b5 (7)^{1/2 + c} for any c > 0
Extends results to mixtures of translated distributions satisfying the Poincare9 inequality
Introduces a novel technique for estimating high-degree moments without explicit tensor computation
Abstract
We consider the problem of clustering mixtures of mean-separated Gaussians in high dimensions. We are given samples from a mixture of identity covariance Gaussians, so that the minimum pairwise distance between any two pairs of means is at least , for some parameter , and the goal is to recover the ground truth clustering of these samples. It is folklore that separation is both necessary and sufficient to recover a good clustering, at least information theoretically. However, the estimators which achieve this guarantee are inefficient. We give the first algorithm which runs in polynomial time, and which almost matches this guarantee. More precisely, we give an algorithm which takes polynomially many samples and time, and which can successfully recover a good clustering, so long as the separation is $\Delta = \Omega (\log^{1/2 +…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
