A Practical Algorithm for Topic Modeling with Provable Guarantees

Sanjeev Arora; Rong Ge; Yoni Halpern; David Mimno; Ankur Moitra; David; Sontag; Yichen Wu; Michael Zhu

arXiv:1212.4777·cs.LG·December 20, 2012·165 cites

A Practical Algorithm for Topic Modeling with Provable Guarantees

Sanjeev Arora, Rong Ge, Yoni Halpern, David Mimno, Ankur Moitra, David, Sontag, Yichen Wu, Michael Zhu

PDF

Open Access 2 Repos

TL;DR

This paper introduces a practical and efficient algorithm for topic modeling that offers provable guarantees, achieving comparable results to existing methods like MCMC but with significantly improved speed.

Contribution

The authors develop a new algorithm for topic inference that combines provable theoretical guarantees with practical efficiency and robustness.

Findings

01

Achieves comparable accuracy to MCMC methods

02

Runs orders of magnitude faster than existing algorithms

03

Provides provable bounds on inference quality

Abstract

Topic models provide a useful method for dimensionality reduction and exploratory data analysis in large text corpora. Most approaches to topic model inference have been based on a maximum likelihood objective. Efficient algorithms exist that approximate this objective, but they have no provable guarantees. Recently, algorithms have been introduced that provide provable bounds, but these algorithms are not practical because they are inefficient and not robust to violations of model assumptions. In this paper we present an algorithm for topic model inference that is both provable and practical. The algorithm produces results comparable to the best MCMC implementations while running orders of magnitude faster.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Graph Neural Networks · Text and Document Classification Technologies