SAT: Improving Semi-Supervised Text Classification with Simple   Instance-Adaptive Self-Training

Hui Chen; Wei Han; Soujanya Poria

arXiv:2210.12653·cs.CL·October 25, 2022

SAT: Improving Semi-Supervised Text Classification with Simple Instance-Adaptive Self-Training

Hui Chen, Wei Han, Soujanya Poria

PDF

Open Access 1 Repo

TL;DR

The paper introduces SAT, a simple and adaptive self-training approach for semi-supervised text classification that uses data augmentation and a meta-learner to improve label prediction accuracy.

Contribution

SAT is a novel semi-supervised learning method that adaptively weighs augmentations to enhance text classification performance.

Findings

01

SAT outperforms existing semi-supervised methods across multiple datasets.

02

The adaptive augmentation strategy improves pseudo-label quality.

03

SAT maintains robust performance with varying labeled data sizes.

Abstract

Self-training methods have been explored in recent years and have exhibited great performance in improving semi-supervised learning. This work presents a Simple instance-Adaptive self-Training method (SAT) for semi-supervised text classification. SAT first generates two augmented views for each unlabeled data and then trains a meta-learner to automatically identify the relative strength of augmentations based on the similarity between the original view and the augmented views. The weakly-augmented view is fed to the model to produce a pseudo-label and the strongly-augmented view is used to train the model to predict the same pseudo-label. We conducted extensive experiments and analyses on three text classification datasets and found that with varying sizes of labeled training data, SAT consistently shows competitive performance compared to existing semi-supervised learning methods. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

declare-lab/sat
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies · Machine Learning and Data Classification · Domain Adaptation and Few-Shot Learning