CPU- and GPU-based Distributed Sampling in Dirichlet Process Mixtures for Large-scale Analysis
Or Dinari, Raz Zamir, John W. Fisher III, Oren Freifeld

TL;DR
This paper introduces a scalable, high-performance software package for Dirichlet Process Mixture Model inference, supporting distributed CPU and GPU implementations with a user-friendly interface, enabling analysis of larger and higher-dimensional datasets.
Contribution
It provides the first flexible, distributed CPU and GPU implementations of DPMM inference with a common interface, improving scalability and usability for large-scale data analysis.
Findings
Achieves significant speedups over previous implementations.
Enables fitting DPMMs to larger, higher-dimensional datasets.
Provides a user-friendly software package with Python integration.
Abstract
In the realm of unsupervised learning, Bayesian nonparametric mixture models, exemplified by the Dirichlet Process Mixture Model (DPMM), provide a principled approach for adapting the complexity of the model to the data. Such models are particularly useful in clustering tasks where the number of clusters is unknown. Despite their potential and mathematical elegance, however, DPMMs have yet to become a mainstream tool widely adopted by practitioners. This is arguably due to a misconception that these models scale poorly as well as the lack of high-performance (and user-friendly) software tools that can handle large datasets efficiently. In this paper we bridge this practical gap by proposing a new, easy-to-use, statistical software package for scalable DPMM inference. More concretely, we provide efficient and easily-modifiable implementations for high-performance distributed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Statistical Methods and Bayesian Inference · Statistical Methods and Inference
