Weakly Supervised Prototype Topic Model with Discriminative Seed Words: Modifying the Category Prior by Self-exploring Supervised Signals
Bing Wang, Yue Wang, Ximing Li, Jihong Ouyang

TL;DR
This paper introduces WSPTM, a weakly supervised topic model that enhances category priors using prototype schemes and corpus frequency knowledge, leading to improved dataless text classification.
Contribution
It proposes a novel category prior formulation incorporating prototype-based label membership and corpus frequency, advancing dataless classification methods.
Findings
WSPTM outperforms baseline methods on real-world datasets.
Incorporating prototype schemes improves label assignment accuracy.
Using corpus frequency knowledge enhances classification performance.
Abstract
Dataless text classification, i.e., a new paradigm of weakly supervised learning, refers to the task of learning with unlabeled documents and a few predefined representative words of categories, known as seed words. The recent generative dataless methods construct document-specific category priors by using seed word occurrences only, however, such category priors often contain very limited and even noisy supervised signals. To remedy this problem, in this paper we propose a novel formulation of category prior. First, for each document, we consider its label membership degree by not only counting seed word occurrences, but also using a novel prototype scheme, which captures pseudo-nearest neighboring categories. Second, for each label, we consider its frequency prior knowledge of the corpus, which is also a discriminative knowledge for classification. By incorporating the proposed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Machine Learning in Bioinformatics · Topic Modeling
