Anchored Correlation Explanation: Topic Modeling with Minimal Domain   Knowledge

Ryan J. Gallagher; Kyle Reing; David Kale; Greg Ver Steeg

arXiv:1611.10277·cs.CL·September 5, 2018

Anchored Correlation Explanation: Topic Modeling with Minimal Domain Knowledge

Ryan J. Gallagher, Kyle Reing, David Kale, Greg Ver Steeg

PDF

1 Repo

TL;DR

CorEx is an information-theoretic approach to topic modeling that avoids complex generative assumptions and effectively incorporates minimal domain knowledge through anchor words, producing high-quality topics.

Contribution

This paper introduces CorEx, a non-generative, information-theoretic topic modeling method that easily integrates domain knowledge via anchor words and extends to hierarchical and semi-supervised settings.

Findings

01

CorEx produces topics comparable to LDA in quality.

02

CorEx effectively incorporates minimal domain knowledge.

03

CorEx generalizes to hierarchical and semi-supervised models.

Abstract

While generative models such as Latent Dirichlet Allocation (LDA) have proven fruitful in topic modeling, they often require detailed assumptions and careful specification of hyperparameters. Such model complexity issues only compound when trying to generalize generative models to incorporate human input. We introduce Correlation Explanation (CorEx), an alternative approach to topic modeling that does not assume an underlying generative model, and instead learns maximally informative topics through an information-theoretic framework. This framework naturally generalizes to hierarchical and semi-supervised extensions with no additional modeling assumptions. In particular, word-level domain knowledge can be flexibly incorporated within CorEx through anchor words, allowing topic separability and representation to be promoted with minimal human intervention. Across a variety of datasets,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gregversteeg/corex_topic
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Discriminant Analysis