Sparse Stochastic Inference for Latent Dirichlet allocation

David Mimno (Princeton University); Matt Hoffman (Columbia; University); David Blei (Princeton University)

arXiv:1206.6425·cs.LG·July 3, 2012·ICML·109 cites

Sparse Stochastic Inference for Latent Dirichlet allocation

David Mimno (Princeton University), Matt Hoffman (Columbia, University), David Blei (Princeton University)

PDF

Open Access

TL;DR

This paper introduces a hybrid stochastic inference algorithm for Bayesian topic models that efficiently handles large datasets, reduces bias, and is applicable to various hidden-variable models.

Contribution

The paper proposes a novel hybrid algorithm combining sparse Gibbs sampling with online stochastic inference for scalable Bayesian topic modeling.

Findings

01

Successfully analyzed 1.2 million books with thousands of topics

02

Reduced bias compared to variational inference

03

Generalized to multiple Bayesian hidden-variable models

Abstract

We present a hybrid algorithm for Bayesian topic models that combines the efficiency of sparse Gibbs sampling with the scalability of online stochastic inference. We used our algorithm to analyze a corpus of 1.2 million books (33 billion words) with thousands of topics. Our approach reduces the bias of variational inference and generalizes to many Bayesian hidden-variable models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Methods and Mixture Models · Topic Modeling · Computational and Text Analysis Methods