Topic Modeling of Hierarchical Corpora
Do-kyum Kim, Geoffrey M. Voelker, Lawrence K. Saul

TL;DR
This paper introduces a parametric hierarchical topic modeling approach with a novel variational inference method, outperforming existing techniques in speed and predictive accuracy on large-scale corpora with complex hierarchies.
Contribution
It presents a simple variational approximation for hierarchical Dirichlet process models that handles conditional dependencies, improving inference efficiency and scalability.
Findings
Faster than Gibbs sampling for hierarchical models
Learns more predictive models than existing variational methods
Successfully applied to large-scale security corpora with complex hierarchies
Abstract
We study the problem of topic modeling in corpora whose documents are organized in a multi-level hierarchy. We explore a parametric approach to this problem, assuming that the number of topics is known or can be estimated by cross-validation. The models we consider can be viewed as special (finite-dimensional) instances of hierarchical Dirichlet processes (HDPs). For these models we show that there exists a simple variational approximation for probabilistic inference. The approximation relies on a previously unexploited inequality that handles the conditional dependence between Dirichlet latent variables in adjacent levels of the model's hierarchy. We compare our approach to existing implementations of nonparametric HDPs. On several benchmarks we find that our approach is faster than Gibbs sampling and able to learn more predictive models than existing variational methods. Finally, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Topic Modeling · Bayesian Methods and Mixture Models
