On Smoothing and Inference for Topic Models

Arthur Asuncion; Max Welling; Padhraic Smyth; Yee Whye Teh

arXiv:1205.2662·cs.LG·May 14, 2012·452 cites

On Smoothing and Inference for Topic Models

Arthur Asuncion, Max Welling, Padhraic Smyth, Yee Whye Teh

PDF

Open Access 1 Repo

TL;DR

This paper compares various algorithms for topic modeling, revealing that differences mainly stem from smoothing levels, and demonstrates how optimized methods can quickly produce accurate models on large text datasets.

Contribution

It provides a detailed empirical comparison of topic modeling algorithms, highlighting the impact of smoothing and hyperparameter optimization on their performance.

Findings

01

Differences among algorithms are mainly due to smoothing levels.

02

Optimized hyperparameters reduce performance disparities.

03

Accurate topic models can be learned in seconds on large corpora.

Abstract

Latent Dirichlet analysis, or topic modeling, is a flexible latent variable framework for modeling high-dimensional sparse count data. Various learning algorithms have been developed in recent years, including collapsed Gibbs sampling, variational inference, and maximum a posteriori estimation, and this variety motivates the need for careful empirical comparisons. In this paper, we highlight the close connections between these approaches. We find that the main differences are attributable to the amount of smoothing applied to the counts. When the hyperparameters are optimized, the differences in performance among the algorithms diminish significantly. The ability of these algorithms to achieve solutions of comparable accuracy gives us the freedom to select computationally efficient approaches. Using the insights gained from this comparative study, we show how accurate topic models can…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

AhmedHlel/soen691-topk-topics
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Methods and Mixture Models · Topic Modeling · Music and Audio Processing