Can clustering scale sublinearly with its clusters? A variational EM acceleration of GMMs and $k$-means
Dennis Forster, J\"org L\"ucke

TL;DR
This paper introduces a variational EM-based approach that reduces the computational complexity of clustering algorithms like k-means and GMMs, enabling sublinear scaling with the number of clusters while maintaining effectiveness.
Contribution
It presents novel theoretical results on truncated variational EM to achieve sublinear complexity in clustering algorithms, significantly reducing computational demands for large cluster counts.
Findings
Achieves sublinear scaling with cluster number C in clustering iterations.
Reduces computational complexity by two to three orders of magnitude for large C.
Maintains comparable clustering quality with traditional methods.
Abstract
One iteration of standard -means (i.e., Lloyd's algorithm) or standard EM for Gaussian mixture models (GMMs) scales linearly with the number of clusters , data points , and data dimensionality . In this study, we explore whether one iteration of -means or EM for GMMs can scale sublinearly with at run-time, while improving the clustering objective remains effective. The tool we apply for complexity reduction is variational EM, which is typically used to make training of generative models with exponentially many hidden states tractable. Here, we apply novel theoretical results on truncated variational EM to make tractable clustering algorithms more efficient. The basic idea is to use a partial variational E-step which reduces the linear complexity of required for a full E-step to a sublinear complexity. Our main observation is that the linear…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Advanced Clustering Algorithms Research · Time Series Analysis and Forecasting
