Topic Modeling based on Keywords and Context

Johannes Schneider

arXiv:1710.02650·cs.CL·February 6, 2018

Topic Modeling based on Keywords and Context

Johannes Schneider

PDF

2 Repos

TL;DR

This paper introduces a novel topic modeling approach that uses characteristic keywords to improve interpretability, efficiency, and consistency of topics, addressing limitations of existing models like LDA.

Contribution

It proposes a keyword-based topic model with a self-regulating number of topics and a simple, parallelizable inference algorithm, showing competitive results.

Findings

01

Comparable qualitative results to LDA with different strengths

02

Improved classification accuracy and PMI scores

03

Enhanced computational performance and topic consistency

Abstract

Current topic models often suffer from discovering topics not matching human intuition, unnatural switching of topics within documents and high computational demands. We address these concerns by proposing a topic model and an inference algorithm based on automatically identifying characteristic keywords for topics. Keywords influence topic-assignments of nearby words. Our algorithm learns (key)word-topic scores and it self-regulates the number of topics. Inference is simple and easily parallelizable. Qualitative analysis yields comparable results to state-of-the-art models (eg. LDA), but with different strengths and weaknesses. Quantitative analysis using 9 datasets shows gains in terms of classification accuracy, PMI score, computational performance and consistency of topic assignments within documents, while most often using less topics.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.