Linear Time Clustering for High Dimensional Mixtures of Gaussian Clouds
Dan Kushnir, Shirin Jalali, Iraj Saniee

TL;DR
This paper introduces a highly efficient clustering algorithm for high-dimensional Gaussian mixtures that achieves linear time complexity in the number of points and near-linear in the dimension, outperforming previous polynomial-time methods.
Contribution
The paper presents a novel random projection-based clustering algorithm with expected linear time complexity in data size and quasi-linear in dimension, applicable to mixtures of Gaussian distributions.
Findings
Expected number of projections is bounded by o(ln p)
Algorithm achieves linear time in n and quasi-linear in p
Sample complexity is independent of p
Abstract
Clustering mixtures of Gaussian distributions is a fundamental and challenging problem that is ubiquitous in various high-dimensional data processing tasks. While state-of-the-art work on learning Gaussian mixture models has focused primarily on improving separation bounds and their generalization to arbitrary classes of mixture models, less emphasis has been paid to practical computational efficiency of the proposed solutions. In this paper, we propose a novel and highly efficient clustering algorithm for points drawn from a mixture of two arbitrary Gaussian distributions in . The algorithm involves performing random 1-dimensional projections until a direction is found that yields a user-specified clustering error . For a 1-dimensional separation parameter satisfying , the expected number of such projections is shown to be bounded by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Machine Learning and Algorithms · Advanced Clustering Algorithms Research
