ClusterCluster: Parallel Markov Chain Monte Carlo for Dirichlet Process Mixtures
Dan Lovell, Jonathan Malmaud, Ryan P. Adams, Vikash K. Mansinghka

TL;DR
This paper introduces a reparameterization of the Dirichlet process that enables parallel MCMC inference, significantly improving computational efficiency without altering the true posterior, demonstrated on large-scale data.
Contribution
The authors propose a novel reparameterization of the Dirichlet process that induces conditional independencies, allowing parallel MCMC inference without changing the posterior distribution.
Findings
Achieves efficient parallel inference on large datasets
Maintains the true posterior distribution during parallelization
Demonstrates scalability on data with over 1 million vectors
Abstract
The Dirichlet process (DP) is a fundamental mathematical tool for Bayesian nonparametric modeling, and is widely used in tasks such as density estimation, natural language processing, and time series modeling. Although MCMC inference methods for the DP often provide a gold standard in terms asymptotic accuracy, they can be computationally expensive and are not obviously parallelizable. We propose a reparameterization of the Dirichlet process that induces conditional independencies between the atoms that form the random measure. This conditional independence enables many of the Markov chain transition operators for DP inference to be simulated in parallel across multiple cores. Applied to mixture modeling, our approach enables the Dirichlet process to simultaneously learn clusters that describe the data and superclusters that define the granularity of parallelization. Unlike previous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Markov Chains and Monte Carlo Methods · Data Management and Algorithms
