Scalable Estimation of Dirichlet Process Mixture Models on Distributed   Data

Ruohui Wang; Dahua Lin

arXiv:1709.06304·stat.ML·September 20, 2017

Scalable Estimation of Dirichlet Process Mixture Models on Distributed Data

Ruohui Wang, Dahua Lin

PDF

TL;DR

This paper introduces a scalable distributed estimation method for Dirichlet Process Mixture Models that efficiently handles new components locally and merges them probabilistically, enabling high scalability without sacrificing performance.

Contribution

The paper presents a novel distributed estimation approach for DPMMs that allows local creation of components and probabilistic merging, reducing communication costs and maintaining consistency.

Findings

01

Achieves high scalability in distributed environments

02

Maintains estimation consistency with low communication overhead

03

Performs well on large real-world datasets

Abstract

We consider the estimation of Dirichlet Process Mixture Models (DPMMs) in distributed environments, where data are distributed across multiple computing nodes. A key advantage of Bayesian nonparametric models such as DPMMs is that they allow new components to be introduced on the fly as needed. This, however, posts an important challenge to distributed estimation -- how to handle new components efficiently and consistently. To tackle this problem, we propose a new estimation method, which allows new components to be created locally in individual computing nodes. Components corresponding to the same cluster will be identified and merged via a probabilistic consolidation scheme. In this way, we can maintain the consistency of estimation with very low communication cost. Experiments on large real-world data sets show that the proposed method can achieve high scalability in distributed and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.