A Geometrical Approach to Topic Model Estimation
Zheng Tracy Ke

TL;DR
This paper introduces a novel geometrical approach leveraging a simplex structure to improve the estimation of topic models from text data using SVD, supported by theoretical analysis and empirical validation.
Contribution
It reveals a low-dimensional simplex structure linking the low-rank topic matrix to SVD, enabling more effective topic model estimation.
Findings
The proposed method accurately recovers topic structures in simulations.
The approach demonstrates strong performance on real-world datasets.
Theoretical analysis provides convergence rates for the estimation.
Abstract
In the probabilistic topic models, the quantity of interest---a low-rank matrix consisting of topic vectors---is hidden in the text corpus matrix, masked by noise, and the Singular Value Decomposition (SVD) is a potentially useful tool for learning such a low-rank matrix. However, the connection between this low-rank matrix and the singular vectors of the text corpus matrix are usually complicated and hard to spell out, so how to use SVD for learning topic models faces challenges. In this paper, we overcome the challenge by revealing a surprising insight: there is a low-dimensional simplex structure which can be viewed as a bridge between the low-rank matrix of interest and the SVD of the text corpus matrix, and allows us to conveniently reconstruct the former using the latter. Such an insight motivates a new SVD approach to learning topic models, which we analyze with delicate random…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Stochastic Gradient Optimization Techniques · Advanced Graph Neural Networks
