Seed-Guided Topic Discovery with Out-of-Vocabulary Seeds
Yu Zhang, Yu Meng, Xuan Wang, Sheng Wang, Jiawei Han

TL;DR
This paper introduces SeeTopic, a novel framework for seed-guided topic discovery that effectively incorporates out-of-vocabulary seeds and leverages pre-trained language models to improve topic relevance and diversity.
Contribution
It generalizes seed-guided topic discovery to handle out-of-vocabulary seeds by integrating PLMs and local corpus semantics.
Findings
SeeTopic improves topic coherence, accuracy, and diversity.
It effectively utilizes out-of-vocabulary seeds.
Experiments on multiple datasets validate its effectiveness.
Abstract
Discovering latent topics from text corpora has been studied for decades. Many existing topic models adopt a fully unsupervised setting, and their discovered topics may not cater to users' particular interests due to their inability of leveraging user guidance. Although there exist seed-guided topic discovery approaches that leverage user-provided seeds to discover topic-representative terms, they are less concerned with two factors: (1) the existence of out-of-vocabulary seeds and (2) the power of pre-trained language models (PLMs). In this paper, we generalize the task of seed-guided topic discovery to allow out-of-vocabulary seeds. We propose a novel framework, named SeeTopic, wherein the general knowledge of PLMs and the local semantics learned from the input corpus can mutually benefit each other. Experiments on three real datasets from different domains demonstrate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
