Sparse Stochastic Inference for Latent Dirichlet allocation
David Mimno (Princeton University), Matt Hoffman (Columbia, University), David Blei (Princeton University)

TL;DR
This paper introduces a hybrid stochastic inference algorithm for Bayesian topic models that efficiently handles large datasets, reduces bias, and is applicable to various hidden-variable models.
Contribution
The paper proposes a novel hybrid algorithm combining sparse Gibbs sampling with online stochastic inference for scalable Bayesian topic modeling.
Findings
Successfully analyzed 1.2 million books with thousands of topics
Reduced bias compared to variational inference
Generalized to multiple Bayesian hidden-variable models
Abstract
We present a hybrid algorithm for Bayesian topic models that combines the efficiency of sparse Gibbs sampling with the scalability of online stochastic inference. We used our algorithm to analyze a corpus of 1.2 million books (33 billion words) with thousands of topics. Our approach reduces the bias of variational inference and generalizes to many Bayesian hidden-variable models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Topic Modeling · Computational and Text Analysis Methods
