Topic Modeling over Short Texts by Incorporating Word Embeddings

Jipeng Qiang; Ping Chen; Tong Wang; Xindong Wu

arXiv:1609.08496·cs.CL·September 28, 2016·20 cites

Topic Modeling over Short Texts by Incorporating Word Embeddings

Jipeng Qiang, Ping Chen, Tong Wang, Xindong Wu

PDF

Open Access 1 Repo

TL;DR

This paper introduces ETM, a novel topic modeling approach for short texts that leverages word embeddings and word correlation knowledge to improve topic coherence and overcome data sparsity issues.

Contribution

The paper proposes a new embedding-based topic model that combines pseudo-text aggregation with a Markov Random Field regularization to enhance short text topic modeling.

Findings

01

ETM outperforms state-of-the-art models on real-world datasets.

02

Incorporating word embeddings improves topic coherence.

03

Using MRF regularization enhances word-topic assignments.

Abstract

Inferring topics from the overwhelming amount of short texts becomes a critical but challenging task for many content analysis tasks, such as content charactering, user interest profiling, and emerging topic detecting. Existing methods such as probabilistic latent semantic analysis (PLSA) and latent Dirichlet allocation (LDA) cannot solve this prob- lem very well since only very limited word co-occurrence information is available in short texts. This paper studies how to incorporate the external word correlation knowledge into short texts to improve the coherence of topic modeling. Based on recent results in word embeddings that learn se- mantically representations for words from a large corpus, we introduce a novel method, Embedding-based Topic Model (ETM), to learn latent topics from short texts. ETM not only solves the problem of very limited word co-occurrence information by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

williamscott701/Embedding-LJST
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Computational and Text Analysis Methods · Advanced Text Analysis Techniques