Clustering a Mixture of Gaussians with Unknown Covariance
Damek Davis, Mateo D\'iaz, Kaizheng Wang

TL;DR
This paper addresses clustering in Gaussian mixture models with unknown covariance, proposing an optimal but computationally hard Max-Cut approach and an efficient spectral algorithm, revealing a statistical-computational gap.
Contribution
It introduces a Max-Cut based optimal clustering method, develops a spectral algorithm with provable guarantees, and extends the approach to multi-component mixtures with theoretical analysis.
Findings
Max-Cut solutions achieve optimal misclassification rates.
Spectral algorithm attains optimal rate with quadratic sample size.
Evidence suggests a statistical-computational gap exists.
Abstract
We investigate a clustering problem with data from a mixture of Gaussians that share a common but unknown, and potentially ill-conditioned, covariance matrix. We start by considering Gaussian mixtures with two equally-sized components and derive a Max-Cut integer program based on maximum likelihood estimation. We prove its solutions achieve the optimal misclassification rate when the number of samples grows linearly in the dimension, up to a logarithmic factor. However, solving the Max-cut problem appears to be computationally intractable. To overcome this, we develop an efficient spectral algorithm that attains the optimal rate but requires a quadratic sample size. Although this sample complexity is worse than that of the Max-cut problem, we conjecture that no polynomial-time method can perform better. Furthermore, we gather numerical and theoretical evidence that supports the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Statistical Methods and Inference · Advanced Statistical Methods and Models
