TL;DR
This paper introduces a flexible weakly-supervised approach for neural text classification that uses pseudo-document generation and self-training to improve performance with limited labeled data.
Contribution
It presents a novel weakly-supervised framework combining pseudo-document generation and self-training, adaptable to various supervision types and compatible with existing neural models.
Findings
Achieves strong performance on multiple real-world datasets.
Outperforms baseline methods significantly.
Effectively handles different types of weak supervision.
Abstract
Deep neural networks are gaining increasing popularity for the classic text classification task, due to their strong expressive power and less requirement for feature engineering. Despite such attractiveness, neural text classification models suffer from the lack of training data in many real-world applications. Although many semi-supervised and weakly-supervised text classification models exist, they cannot be easily applied to deep neural models and meanwhile support limited supervision types. In this paper, we propose a weakly-supervised method that addresses the lack of training data in neural text classification. Our method consists of two modules: (1) a pseudo-document generator that leverages seed information to generate pseudo-labeled documents for model pre-training, and (2) a self-training module that bootstraps on real unlabeled data for model refinement. Our method has the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
