Weakly-Supervised Neural Text Classification

Yu Meng; Jiaming Shen; Chao Zhang; Jiawei Han

arXiv:1809.01478·cs.IR·September 13, 2018

Weakly-Supervised Neural Text Classification

Yu Meng, Jiaming Shen, Chao Zhang, Jiawei Han

PDF

1 Repo

TL;DR

This paper introduces a flexible weakly-supervised approach for neural text classification that uses pseudo-document generation and self-training to improve performance with limited labeled data.

Contribution

It presents a novel weakly-supervised framework combining pseudo-document generation and self-training, adaptable to various supervision types and compatible with existing neural models.

Findings

01

Achieves strong performance on multiple real-world datasets.

02

Outperforms baseline methods significantly.

03

Effectively handles different types of weak supervision.

Abstract

Deep neural networks are gaining increasing popularity for the classic text classification task, due to their strong expressive power and less requirement for feature engineering. Despite such attractiveness, neural text classification models suffer from the lack of training data in many real-world applications. Although many semi-supervised and weakly-supervised text classification models exist, they cannot be easily applied to deep neural models and meanwhile support limited supervision types. In this paper, we propose a weakly-supervised method that addresses the lack of training data in neural text classification. Our method consists of two modules: (1) a pseudo-document generator that leverages seed information to generate pseudo-labeled documents for model pre-training, and (2) a self-training module that bootstraps on real unlabeled data for model refinement. Our method has the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yumeng5/WeSTClass
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.