Dense Distributions from Sparse Samples: Improved Gibbs Sampling Parameter Estimators for LDA
Yannis Papanikolaou, James R. Foulds, Timothy N. Rubin, Grigorios, Tsoumakas

TL;DR
This paper presents a new method for estimating LDA parameters from Gibbs samples by leveraging full conditional distributions, improving accuracy with minimal additional computational cost, and outperforming traditional methods in experiments.
Contribution
It introduces a novel averaging technique for Gibbs sampling in LDA that combines advantages of CVB0 and CGS, applicable to existing implementations.
Findings
Consistent improvement over standard CGS in all tested conditions.
Outperforms CVB0 inference in most experimental scenarios.
Highlights the benefits of averaging over multiple samples for LDA estimation.
Abstract
We introduce a novel approach for estimating Latent Dirichlet Allocation (LDA) parameters from collapsed Gibbs samples (CGS), by leveraging the full conditional distributions over the latent variable assignments to efficiently average over multiple samples, for little more computational cost than drawing a single additional collapsed Gibbs sample. Our approach can be understood as adapting the soft clustering methodology of Collapsed Variational Bayes (CVB0) to CGS parameter estimation, in order to get the best of both techniques. Our estimators can straightforwardly be applied to the output of any existing implementation of CGS, including modern accelerated variants. We perform extensive empirical comparisons of our estimators with those of standard collapsed inference algorithms on real-world data for both unsupervised LDA and Prior-LDA, a supervised variant of LDA for multi-label…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Gaussian Processes and Bayesian Inference · Statistical Methods and Inference
MethodsLinear Discriminant Analysis
