Supervised Topic Models

David M. Blei; Jon D. McAuliffe

arXiv:1003.0783·stat.ML·March 4, 2010·NeurIPS·1.3k cites

Supervised Topic Models

David M. Blei, Jon D. McAuliffe

PDF

Open Access 1 Repo

TL;DR

This paper presents supervised latent Dirichlet allocation (sLDA), a model for predicting responses from labeled documents, demonstrated on movie ratings and political text, offering advantages over traditional methods.

Contribution

The paper introduces sLDA, a novel supervised topic model that integrates response prediction into the LDA framework using variational inference.

Findings

01

sLDA outperforms regularized regression in prediction accuracy

02

sLDA improves over unsupervised LDA plus separate regression

03

sLDA effectively models diverse response types

Abstract

We introduce supervised latent Dirichlet allocation (sLDA), a statistical model of labelled documents. The model accommodates a variety of response types. We derive an approximate maximum-likelihood procedure for parameter estimation, which relies on variational methods to handle intractable posterior expectations. Prediction problems motivate this research: we use the fitted model to predict response values for new documents. We test sLDA on two real-world problems: movie ratings predicted from reviews, and the political tone of amendments in the U.S. Senate based on the amendment text. We illustrate the benefits of sLDA versus modern regularized regression, as well as versus an unsupervised LDA analysis followed by a separate regression.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

labixiaoK/lda
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Methods and Mixture Models · Computational and Text Analysis Methods · Topic Modeling

MethodsLinear Discriminant Analysis