An Open Source C++ Implementation of Multi-Threaded Gaussian Mixture Models, k-Means and Expectation Maximisation
Conrad Sanderson, Ryan Curtin

TL;DR
This paper presents a fast, robust, and multi-threaded C++ implementation of Gaussian mixture models, k-means, and EM algorithms that significantly improves speed and accuracy for density modeling in machine learning applications.
Contribution
It introduces a multi-threaded, MapReduce-like framework for GMM training algorithms, enhancing speed and stability over existing implementations.
Findings
Achieves an order of magnitude speedup on 16-core machines.
Provides higher modeling accuracy than previous public implementations.
Includes the implementation in the open source Armadillo library under Apache 2.0 license.
Abstract
Modelling of multivariate densities is a core component in many signal processing, pattern recognition and machine learning applications. The modelling is often done via Gaussian mixture models (GMMs), which use computationally expensive and potentially unstable training algorithms. We provide an overview of a fast and robust implementation of GMMs in the C++ language, employing multi-threaded versions of the Expectation Maximisation (EM) and k-means training algorithms. Multi-threading is achieved through reformulation of the EM and k-means algorithms into a MapReduce-like framework. Furthermore, the implementation uses several techniques to improve numerical stability and modelling accuracy. We demonstrate that the multi-threaded implementation achieves a speedup of an order of magnitude on a recent 16 core machine, and that it can achieve higher modelling accuracy than a previously…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
