On Estimation and Selection for Topic Models
Matthew A. Taddy

TL;DR
This paper introduces a novel approach for estimating and selecting the number of topics in models using posterior maximization, a new marginal likelihood estimation method, and goodness-of-fit analysis, improving model selection accuracy.
Contribution
It presents a non-standard parametrization for posterior maximization and a new likelihood-based method for determining the optimal number of latent topics.
Findings
Effective model selection demonstrated through examples
Improved estimation accuracy over standard techniques
Goodness-of-fit analysis enhances model validation
Abstract
This article describes posterior maximization for topic models, identifying computational and conceptual gains from inference under a non-standard parametrization. We then show that fitted parameters can be used as the basis for a novel approach to marginal likelihood estimation, via block-diagonal approximation to the information matrix,that facilitates choosing the number of latent topics. This likelihood-based model selection is complemented with a goodness-of-fit analysis built around estimated residual dispersion. Examples are provided to illustrate model selection as well as to compare our estimation against standard alternative techniques.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Computational and Text Analysis Methods · Topic Modeling
