Conic Scan-and-Cover algorithms for nonparametric topic modeling
Mikhail Yurochkin, Aritra Guha, XuanLong Nguyen

TL;DR
This paper introduces novel conic scan-and-cover algorithms for nonparametric topic modeling that accurately estimate topics without prior knowledge of their number, leveraging geometric properties of the topic simplex.
Contribution
The paper presents a new geometric approach for nonparametric topic modeling, providing algorithms that are both fast and statistically consistent, unlike existing methods requiring the number of topics.
Findings
Algorithms achieve accuracy comparable to Gibbs sampling.
Methods are among the fastest in the state of the art.
Statistical consistency is theoretically established.
Abstract
We propose new algorithms for topic modeling when the number of topics is unknown. Our approach relies on an analysis of the concentration of mass and angular geometry of the topic simplex, a convex polytope constructed by taking the convex hull of vertices representing the latent topics. Our algorithms are shown in practice to have accuracy comparable to a Gibbs sampler in terms of topic estimation, which requires the number of topics be given. Moreover, they are one of the fastest among several state of the art parametric techniques. Statistical consistency of our estimator is established under some conditions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Natural Language Processing Techniques · Topic Modeling
