Geometric Dirichlet Means algorithm for topic inference

Mikhail Yurochkin; XuanLong Nguyen

arXiv:1610.09034·stat.ML·October 31, 2016·NeurIPS·1 cites

Geometric Dirichlet Means algorithm for topic inference

Mikhail Yurochkin, XuanLong Nguyen

PDF

Open Access

TL;DR

This paper introduces a geometric algorithm for topic inference in LDA models that is faster and more accurate than traditional methods, with proven statistical consistency and extensive experimental validation.

Contribution

A novel geometric clustering algorithm for topic inference that improves computational efficiency and accuracy over existing methods like Gibbs sampling and variational inference.

Findings

01

Achieves comparable accuracy to Gibbs sampling

02

Overcomes computational inefficiencies of existing methods

03

Proven statistical consistency under certain conditions

Abstract

We propose a geometric algorithm for topic learning and inference that is built on the convex geometry of topics arising from the Latent Dirichlet Allocation (LDA) model and its nonparametric extensions. To this end we study the optimization of a geometric loss function, which is a surrogate to the LDA's likelihood. Our method involves a fast optimization based weighted clustering procedure augmented with geometric corrections, which overcomes the computational and statistical inefficiencies encountered by other techniques based on Gibbs sampling and variational inference, while achieving the accuracy comparable to that of a Gibbs sampler. The topic estimates produced by our method are shown to be statistically consistent under some conditions. The algorithm is evaluated with extensive experiments on simulated and real data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Methods and Mixture Models · Gaussian Processes and Bayesian Inference · Genetic and phenotypic traits in livestock