Topic representation: finding more representative words in topic models

Jinjin Chi; Jihong Ouyang; Changchun Li; Xueyang Dong; Ximing Li,; Xinhua Wang

arXiv:1810.10307·cs.IR·October 25, 2018

Topic representation: finding more representative words in topic models

Jinjin Chi, Jihong Ouyang, Changchun Li, Xueyang Dong, Ximing Li,, Xinhua Wang

PDF

Open Access

TL;DR

This paper proposes a new method for selecting more representative words in topic models by reranking top words based on their distribution across all topics, improving alignment with human judgment.

Contribution

It introduces three reranking techniques to enhance topic word representation, addressing the limitations of the standard top-M word list.

Findings

01

Reranked word lists better match human judgments

02

The methods improve topic interpretability

03

Experimental results show increased representativeness

Abstract

The top word list, i.e., the top-M words with highest marginal probability in a given topic, is the standard topic representation in topic models. Most of recent automatical topic labeling algorithms and popular topic quality metrics are based on it. However, we find, empirically, words in this type of top word list are not always representative. The objective of this paper is to find more representative top word lists for topics. To achieve this, we rerank the words in a given topic by further considering marginal probability on words over every other topic. The reranking list of top-M words is used to be a novel topic representation for topic models. We investigate three reranking methodologies, using (1) standard deviation weight, (2) standard deviation weight with topic size and (3) Chi Square \c{hi}2statistic selection. Experimental results on real world collections indicate that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Text Analysis Techniques · Natural Language Processing Techniques