Scalable Inference for Latent Dirichlet Allocation

James Petterson; Tiberio Caetano

arXiv:0909.4603·cs.LG·September 28, 2009

Scalable Inference for Latent Dirichlet Allocation

James Petterson, Tiberio Caetano

PDF

Open Access

TL;DR

This paper presents a scalable, asynchronous distributed inference method for Latent Dirichlet Allocation that balances speed and accuracy, suitable for heterogeneous computing clusters.

Contribution

It introduces a simple, tunable approximation method for distributed LDA inference that is asynchronous and adaptable to different hardware environments.

Findings

01

Efficient distributed inference with adjustable accuracy

02

Asynchronous approach suitable for heterogeneous clusters

03

Demonstrates scalability and flexibility in LDA learning

Abstract

We investigate the problem of learning a topic model - the well-known Latent Dirichlet Allocation - in a distributed manner, using a cluster of C processors and dividing the corpus to be learned equally among them. We propose a simple approximated method that can be tuned, trading speed for accuracy according to the task at hand. Our approach is asynchronous, and therefore suitable for clusters of heterogenous machines.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Bayesian Methods and Mixture Models