Document Classification by Inversion of Distributed Language   Representations

Matt Taddy

arXiv:1504.07295·cs.CL·July 27, 2015

Document Classification by Inversion of Distributed Language Representations

Matt Taddy

PDF

1 Repo

TL;DR

This paper demonstrates that distributed language representations can be inverted using Bayes rule to create effective classifiers, performing comparably or better than specialized algorithms on Yelp review data.

Contribution

It introduces a simple, modular method to convert any distributed language model into a classifier through Bayesian inversion.

Findings

01

Performs as well or better than purpose-built algorithms on Yelp reviews

02

Applicable to any language representation trained as a probabilistic model

03

Simple and modular approach for classification using distributed representations

Abstract

There have been many recent advances in the structure and measurement of distributed language models: those that map from words to a vector-space that is rich in information about word choice and composition. This vector-space is the distributed language representation. The goal of this note is to point out that any distributed representation can be turned into a classifier through inversion via Bayes rule. The approach is simple and modular, in that it will work with any language representation whose training can be formulated as optimizing a probability model. In our application to 2 million sentences from Yelp reviews, we also find that it performs as well as or better than complex purpose-built algorithms.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

taddylab/deepir
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.