Clustering Mixtures with Almost Optimal Separation in Polynomial Time

Jerry Li; Allen Liu

arXiv:2112.00706·cs.DS·December 2, 2021

Clustering Mixtures with Almost Optimal Separation in Polynomial Time

Jerry Li, Allen Liu

PDF

TL;DR

This paper introduces a polynomial-time algorithm for clustering high-dimensional Gaussian mixtures with near-optimal separation, significantly improving efficiency over previous methods that required larger separation or quasipolynomial time.

Contribution

The authors develop the first polynomial-time algorithm that nearly matches the information-theoretic separation threshold for clustering Gaussian mixtures.

Findings

01

Successfully recovers clusters with separation b4 = b5 (7)^{1/2 + c} for any c > 0

02

Extends results to mixtures of translated distributions satisfying the Poincare9 inequality

03

Introduces a novel technique for estimating high-degree moments without explicit tensor computation

Abstract

We consider the problem of clustering mixtures of mean-separated Gaussians in high dimensions. We are given samples from a mixture of $k$ identity covariance Gaussians, so that the minimum pairwise distance between any two pairs of means is at least $Δ$ , for some parameter $Δ > 0$ , and the goal is to recover the ground truth clustering of these samples. It is folklore that separation $Δ = Θ (lo g k)$ is both necessary and sufficient to recover a good clustering, at least information theoretically. However, the estimators which achieve this guarantee are inefficient. We give the first algorithm which runs in polynomial time, and which almost matches this guarantee. More precisely, we give an algorithm which takes polynomially many samples and time, and which can successfully recover a good clustering, so long as the separation is $\Delta = \Omega (\log^{1/2 +…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.