Effective Seed-Guided Topic Discovery by Integrating Multiple Types of Contexts
Yu Zhang, Yunyi Zhang, Martin Michalski, Yucheng Jiang, Yu Meng,, Jiawei Han

TL;DR
This paper introduces SeedTopicMine, an iterative framework that combines multiple context signals—local, pre-trained, and retrieved sentence contexts—to improve seed-guided topic discovery, resulting in more coherent and accurate topics.
Contribution
It proposes a novel method that integrates three types of context information for seed-guided topic discovery, enhancing topic quality over existing approaches.
Findings
Outperforms existing seed-guided methods in coherence and accuracy.
Combining multiple contexts provides complementary semantic information.
Consistent improvements across various datasets and seed sets.
Abstract
Instead of mining coherent topics from a given text corpus in a completely unsupervised manner, seed-guided topic discovery methods leverage user-provided seed words to extract distinctive and coherent topics so that the mined topics can better cater to the user's interest. To model the semantic correlation between words and seeds for discovering topic-indicative terms, existing seed-guided approaches utilize different types of context signals, such as document-level word co-occurrences, sliding window-based local contexts, and generic linguistic knowledge brought by pre-trained language models. In this work, we analyze and show empirically that each type of context information has its value and limitation in modeling word semantics under seed guidance, but combining three types of contexts (i.e., word embeddings learned from local contexts, pre-trained language model representations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Text Analysis Techniques · Text and Document Classification Technologies
