Less is More: Learning Prominent and Diverse Topics for Data   Summarization

Jian Tang; Cheng Li; Ming Zhang; and Qiaozhu Mei

arXiv:1611.09921·cs.LG·December 2, 2016

Less is More: Learning Prominent and Diverse Topics for Data Summarization

Jian Tang, Cheng Li, Ming Zhang, and Qiaozhu Mei

PDF

Open Access

TL;DR

This paper introduces diverse topic models that learn fewer, more representative topics for data summarization by using a reinforced random walk, improving diversity and requiring less prior knowledge.

Contribution

It proposes a novel reinforced random walk mechanism integrated into classical topic models to enhance diversity and reduce the number of topics needed for effective summarization.

Findings

01

Models discover more representative topics

02

Enhanced diversity among top topics

03

Require minimal prior knowledge

Abstract

Statistical topic models efficiently facilitate the exploration of large-scale data sets. Many models have been developed and broadly used to summarize the semantic structure in news, science, social media, and digital humanities. However, a common and practical objective in data exploration tasks is not to enumerate all existing topics, but to quickly extract representative ones that broadly cover the content of the corpus, i.e., a few topics that serve as a good summary of the data. Most existing topic models fit exactly the same number of topics as a user specifies, which have imposed an unnecessary burden to the users who have limited prior knowledge. We instead propose new models that are able to learn fewer but more representative topics for the purpose of data summarization. We propose a reinforced random walk that allows prominent topics to absorb tokens from similar and smaller…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Text Analysis Techniques · Natural Language Processing Techniques