Fast estimation of Gaussian mixture components via centering and singular value thresholding
Huan Qing

TL;DR
This paper introduces a fast, non-iterative method for estimating the number of components in high-dimensional Gaussian mixture models using data centering and singular value thresholding, effective even with many components and class imbalance.
Contribution
It proposes a simple, fast estimator that does not require iterative fitting or prior knowledge, with theoretical guarantees under mild separation conditions.
Findings
The estimator consistently recovers the true number of components under mild separation.
It performs accurately in high-dimensional, many-component, and imbalanced scenarios.
The method is computationally efficient, processing large datasets within minutes.
Abstract
Estimating the number of components is a fundamental challenge in unsupervised learning, particularly when dealing with high-dimensional data with many components or severely imbalanced component sizes. This paper addresses this challenge for classical Gaussian mixture models. The proposed estimator is simple: center the data, compute the singular values of the centered matrix, and count those above a threshold. No iterative fitting, no likelihood calculation, and no prior knowledge of the number of components are required. We prove that, under a mild separation condition on the component centers, the estimator consistently recovers the true number of components. The result holds in high-dimensional settings where the dimension can be much larger than the sample size. It also holds when the number of components grows to the smaller of the dimension and the sample size, even under severe…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
