Less is More: Learning Prominent and Diverse Topics for Data Summarization
Jian Tang, Cheng Li, Ming Zhang, and Qiaozhu Mei

TL;DR
This paper introduces diverse topic models that learn fewer, more representative topics for data summarization by using a reinforced random walk, improving diversity and requiring less prior knowledge.
Contribution
It proposes a novel reinforced random walk mechanism integrated into classical topic models to enhance diversity and reduce the number of topics needed for effective summarization.
Findings
Models discover more representative topics
Enhanced diversity among top topics
Require minimal prior knowledge
Abstract
Statistical topic models efficiently facilitate the exploration of large-scale data sets. Many models have been developed and broadly used to summarize the semantic structure in news, science, social media, and digital humanities. However, a common and practical objective in data exploration tasks is not to enumerate all existing topics, but to quickly extract representative ones that broadly cover the content of the corpus, i.e., a few topics that serve as a good summary of the data. Most existing topic models fit exactly the same number of topics as a user specifies, which have imposed an unnecessary burden to the users who have limited prior knowledge. We instead propose new models that are able to learn fewer but more representative topics for the purpose of data summarization. We propose a reinforced random walk that allows prominent topics to absorb tokens from similar and smaller…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Text Analysis Techniques · Natural Language Processing Techniques
