Blocking Collapsed Gibbs Sampler for Latent Dirichlet Allocation Models

Xin Zhang; Scott A. Sisson

arXiv:1608.00945·stat.CO·August 3, 2016·1 cites

Blocking Collapsed Gibbs Sampler for Latent Dirichlet Allocation Models

Xin Zhang, Scott A. Sisson

PDF

Open Access

TL;DR

This paper introduces a blocking scheme for the collapsed Gibbs sampler in LDA models, significantly improving mixing efficiency and reducing computation time for large numbers of topics.

Contribution

The paper proposes a novel blocking scheme with theoretical guarantees that enhances sampling efficiency and computational speed in LDA inference.

Findings

01

Substantial improvement in chain mixing efficiency.

02

Significant reduction in computation time for models with many topics.

03

Effective sampling procedures within each block.

Abstract

The latent Dirichlet allocation (LDA) model is a widely-used latent variable model in machine learning for text analysis. Inference for this model typically involves a single-site collapsed Gibbs sampling step for latent variables associated with observations. The efficiency of the sampling is critical to the success of the model in practical large scale applications. In this article, we introduce a blocking scheme to the collapsed Gibbs sampler for the LDA model which can, with a theoretical guarantee, improve chain mixing efficiency. We develop two procedures, an O(K)-step backward simulation and an O(log K)-step nested simulation, to directly sample the latent variables within each block. We demonstrate that the blocking scheme achieves substantial improvements in chain mixing compared to the state of the art single-site collapsed Gibbs sampler. We also show that when the number of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Methods and Mixture Models · Markov Chains and Monte Carlo Methods · Gaussian Processes and Bayesian Inference

MethodsLinear Discriminant Analysis