Learning Topic Models - Going beyond SVD

Sanjeev Arora; Rong Ge; Ankur Moitra

arXiv:1204.1956·cs.LG·April 13, 2012·61 cites

Learning Topic Models - Going beyond SVD

Sanjeev Arora, Rong Ge, Ankur Moitra

PDF

Open Access 2 Repos

TL;DR

This paper introduces a polynomial-time algorithm for learning topic models using Nonnegative Matrix Factorization (NMF), overcoming limitations of previous methods that relied on SVD, and extends to models with topic correlations.

Contribution

It provides the first polynomial-time algorithm for topic model learning using NMF under the separability assumption, generalizing to correlated topic models.

Findings

01

First polynomial-time NMF-based algorithm for topic models

02

Handles models with topic-topic correlations

03

Works without assuming single-topic documents

Abstract

Topic Modeling is an approach used for automatic comprehension and classification of data in a variety of settings, and perhaps the canonical application is in uncovering thematic structure in a corpus of documents. A number of foundational works both in machine learning and in theory have suggested a probabilistic model for documents, whereby documents arise as a convex combination of (i.e. distribution on) a small number of topic vectors, each topic vector being a distribution on words (i.e. a vector of word-frequencies). Similar models have since been used in a variety of application areas; the Latent Dirichlet Allocation or LDA model of Blei et al. is especially popular. Theoretical studies of topic modeling focus on learning the model's parameters assuming the data is actually generated from it. Existing approaches for the most part rely on Singular Value Decomposition(SVD), and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Text and Document Classification Technologies · Advanced Text Analysis Techniques

MethodsLinear Discriminant Analysis