A Spectral Algorithm for Latent Dirichlet Allocation
Animashree Anandkumar, Dean P. Foster, Daniel Hsu, Sham M. Kakade,, Yi-Kai Liu

TL;DR
This paper introduces a spectral algorithm called Excess Correlation Analysis (ECA) for efficiently recovering latent topics in LDA models using only third-order moments, improving scalability and accuracy in topic modeling.
Contribution
The paper presents a novel spectral method for learning LDA parameters using low-order moments, which is simple, scalable, and guarantees recovery of topic distributions.
Findings
Successfully recovers topic vectors and priors using trigram statistics.
Uses spectral decomposition on low-order moments for parameter estimation.
Algorithm is scalable with respect to the number of topics.
Abstract
The problem of topic modeling can be seen as a generalization of the clustering problem, in that it posits that observations are generated due to multiple latent factors (e.g., the words in each document are generated as a mixture of several active topics, as opposed to just one). This increased representational power comes at the cost of a more challenging unsupervised learning problem of estimating the topic probability vectors (the distributions over words for each topic), when only the words are observed and the corresponding topics are hidden. We provide a simple and efficient learning procedure that is guaranteed to recover the parameters for a wide class of mixture models, including the popular latent Dirichlet allocation (LDA) model. For LDA, the procedure correctly recovers both the topic probability vectors and the prior over the topics, using only trigram statistics (i.e.,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Text and Document Classification Technologies · Topic Modeling
