Large Scale Clustering with Variational EM for Gaussian Mixture Models
Florian Hirschberger, Dennis Forster, J\"org L\"ucke

TL;DR
This paper introduces a sublinear variational EM algorithm for large-scale Gaussian mixture model clustering, demonstrating significant speedups and scalability to datasets with millions of data points and thousands of clusters.
Contribution
The paper presents a novel sublinear variational EM algorithm combined with coreset methods for efficient large-scale GMM clustering, extending previous work to handle massive datasets.
Findings
Achieved substantial speedups over existing clustering methods.
Successfully clustered 80 million images into 32,000 clusters.
Demonstrated scalability and efficiency in large-scale clustering tasks.
Abstract
This paper represents a preliminary (pre-reviewing) version of a sublinear variational algorithm for isotropic Gaussian mixture models (GMMs). Further developments of the algorithm for GMMs with diagonal covariance matrices (instead of isotropic clusters) and their corresponding benchmarking results have been published by TPAMI (doi:10.1109/TPAMI.2021.3133763) in the paper "A Variational EM Acceleration for Efficient Clustering at Very Large Scales". We kindly refer the reader to the TPAMI paper instead of this much earlier arXiv version (the TPAMI paper is also open access). Publicly available source code accompanies the paper (see https://github.com/variational-sublinear-clustering). Please note that the TPAMI paper does not contain the benchmark on the 80 Million Tiny Images dataset anymore because we followed the call of the dataset creators to discontinue the use of that dataset.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Bayesian Methods and Mixture Models · Advanced Image and Video Retrieval Techniques
MethodsCoresets
