Learning Topic Hierarchies by Tree-Directed Latent Variable Models

Sunrit Chakraborty; Rayleigh Lei; XuanLong Nguyen

arXiv:2408.14327·math.ST·August 27, 2024

Learning Topic Hierarchies by Tree-Directed Latent Variable Models

Sunrit Chakraborty, Rayleigh Lei, XuanLong Nguyen

PDF

Open Access

TL;DR

This paper introduces a hierarchical topic model with a tree-structured latent hierarchy, providing theoretical guarantees for identifiability and posterior convergence, validated through simulations and real-world data.

Contribution

It develops a novel hierarchical topic model with provable identifiability and convergence properties, advancing interpretability and theoretical understanding in topic modeling.

Findings

01

Model successfully recovers latent hierarchies in simulations

02

Theoretical bounds for posterior contraction rates established

03

Real data application demonstrates practical utility

Abstract

We study a parametric family of latent variable models, namely topic models, equipped with a hierarchical structure among the topic variables. Such models may be viewed as a finite mixture of the latent Dirichlet allocation (LDA) induced distributions, but the LDA components are constrained by a latent hierarchy, specifically a rooted and directed tree structure, which enables the learning of interpretable and latent topic hierarchies of interest. A mathematical framework is developed in order to establish identifiability of the latent topic hierarchy under suitable regularity conditions, and to derive bounds for posterior contraction rates of the model and its parameters. We demonstrate the usefulness of such models and validate its theoretical properties through a careful simulation study and a real data example using the New York Times articles.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Text and Document Classification Technologies · Advanced Text Analysis Techniques