Learning Topic Hierarchies by Tree-Directed Latent Variable Models
Sunrit Chakraborty, Rayleigh Lei, XuanLong Nguyen

TL;DR
This paper introduces a hierarchical topic model with a tree-structured latent hierarchy, providing theoretical guarantees for identifiability and posterior convergence, validated through simulations and real-world data.
Contribution
It develops a novel hierarchical topic model with provable identifiability and convergence properties, advancing interpretability and theoretical understanding in topic modeling.
Findings
Model successfully recovers latent hierarchies in simulations
Theoretical bounds for posterior contraction rates established
Real data application demonstrates practical utility
Abstract
We study a parametric family of latent variable models, namely topic models, equipped with a hierarchical structure among the topic variables. Such models may be viewed as a finite mixture of the latent Dirichlet allocation (LDA) induced distributions, but the LDA components are constrained by a latent hierarchy, specifically a rooted and directed tree structure, which enables the learning of interpretable and latent topic hierarchies of interest. A mathematical framework is developed in order to establish identifiability of the latent topic hierarchy under suitable regularity conditions, and to derive bounds for posterior contraction rates of the model and its parameters. We demonstrate the usefulness of such models and validate its theoretical properties through a careful simulation study and a real data example using the New York Times articles.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Text and Document Classification Technologies · Advanced Text Analysis Techniques
