A Double Parametric Bootstrap Test for Topic Models
Skyler Seto, Sarah Tan, Giles Hooker, and Martin T. Wells

TL;DR
This paper introduces a double parametric bootstrap test to evaluate the fit of NMF-based topic models, addressing likelihood assumption violations in real document corpora.
Contribution
It proposes a novel bootstrap testing method leveraging KL divergence and Poisson likelihood to assess NMF topic model reliability.
Findings
Test accurately identifies well-fitting models in simulations
Effective in real-world document datasets
Addresses likelihood assumption violations
Abstract
Non-negative matrix factorization (NMF) is a technique for finding latent representations of data. The method has been applied to corpora to construct topic models. However, NMF has likelihood assumptions which are often violated by real document corpora. We present a double parametric bootstrap test for evaluating the fit of an NMF-based topic model based on the duality of the KL divergence and Poisson maximum likelihood estimation. The test correctly identifies whether a topic model based on an NMF approach yields reliable results in simulated and real data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies
