Empath: Understanding Topic Signals in Large-Scale Text

Ethan Fast; Binbin Chen; Michael Bernstein

arXiv:1602.06979·cs.CL·February 24, 2016

Empath: Understanding Topic Signals in Large-Scale Text

Ethan Fast, Binbin Chen, Michael Bernstein

PDF

1 Repo

TL;DR

Empath is a scalable tool that generates and validates lexical categories from seed words using neural embeddings trained on a large corpus, enabling nuanced topic analysis in text.

Contribution

It introduces a method to create and validate new lexical categories on demand from small seed sets using deep learning and crowd validation.

Findings

01

Empath's categories correlate highly with LIWC categories (r=0.906).

02

It can generate relevant new categories from minimal seed words.

03

Empath analyzes text across 200 pre-validated categories.

Abstract

Human language is colored by a broad range of topics, but existing text analysis tools only focus on a small number of them. We present Empath, a tool that can generate and validate new lexical categories on demand from a small set of seed terms (like "bleed" and "punch" to generate the category violence). Empath draws connotations between words and phrases by deep learning a neural embedding across more than 1.8 billion words of modern fiction. Given a small set of seed words that characterize a category, Empath uses its neural embedding to discover new related terms, then validates the category with a crowd-powered filter. Empath also analyzes text across 200 built-in, pre-validated categories we have generated from common topics in our web dataset, like neglect, government, and social media. We show that Empath's data-driven, human validated categories are highly correlated (r=0.906)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

manavkaushik/fake-news-dection-using-NLP
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.