CobwebTM: Probabilistic Concept Formation for Lifelong and Hierarchical Topic Modeling
Karthik Singaravadivelan, Anant Gupta, Zekun Wang, Christopher J. MacLellan

TL;DR
CobwebTM is an online hierarchical topic model that uses incremental probabilistic concept formation with document embeddings to discover and organize topics dynamically without predefining their number.
Contribution
It adapts the Cobweb algorithm to continuous embeddings, enabling lifelong, hierarchical, and unsupervised topic modeling with minimal tuning.
Findings
Achieves strong topic coherence across datasets
Maintains stable topics over time
Constructs high-quality semantic hierarchies
Abstract
Topic modeling seeks to uncover latent semantic structure in text corpora with minimal supervision. Neural approaches achieve strong performance but require extensive tuning and struggle with lifelong learning due to catastrophic forgetting and fixed capacity, while classical probabilistic models lack flexibility and adaptability to streaming data. We introduce CobwebTM, a low-parameter lifelong hierarchical topic model based on incremental probabilistic concept formation. By adapting the Cobweb algorithm to continuous document embeddings, CobwebTM constructs semantic hierarchies online, enabling unsupervised topic discovery, dynamic topic creation, and hierarchical organization without predefining the number of topics. Across diverse datasets, CobwebTM achieves strong topic coherence, stable topics over time, and high-quality hierarchies, demonstrating that incremental symbolic concept…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
