On Estimation and Selection for Topic Models

Matthew A. Taddy

arXiv:1109.4518·stat.AP·December 30, 2011·AISTATS·83 cites

On Estimation and Selection for Topic Models

Matthew A. Taddy

PDF

Open Access

TL;DR

This paper introduces a novel approach for estimating and selecting the number of topics in models using posterior maximization, a new marginal likelihood estimation method, and goodness-of-fit analysis, improving model selection accuracy.

Contribution

It presents a non-standard parametrization for posterior maximization and a new likelihood-based method for determining the optimal number of latent topics.

Findings

01

Effective model selection demonstrated through examples

02

Improved estimation accuracy over standard techniques

03

Goodness-of-fit analysis enhances model validation

Abstract

This article describes posterior maximization for topic models, identifying computational and conceptual gains from inference under a non-standard parametrization. We then show that fitted parameters can be used as the basis for a novel approach to marginal likelihood estimation, via block-diagonal approximation to the information matrix,that facilitates choosing the number of latent topics. This likelihood-based model selection is complemented with a goodness-of-fit analysis built around estimated residual dispersion. Examples are provided to illustrate model selection as well as to compare our estimation against standard alternative techniques.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Methods and Mixture Models · Computational and Text Analysis Methods · Topic Modeling