Self-training Improves Pre-training for Natural Language Understanding

Jingfei Du; Edouard Grave; Beliz Gunel; Vishrav Chaudhary; Onur; Celebi; Michael Auli; Ves Stoyanov; Alexis Conneau

arXiv:2010.02194·cs.CL·October 6, 2020·46 cites

Self-training Improves Pre-training for Natural Language Understanding

Jingfei Du, Edouard Grave, Beliz Gunel, Vishrav Chaudhary, Onur, Celebi, Michael Auli, Ves Stoyanov, Alexis Conneau

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that self-training combined with a novel data augmentation method, SentAugment, enhances pre-training for natural language understanding, leading to significant improvements across multiple tasks without requiring in-domain unlabeled data.

Contribution

The paper introduces SentAugment, a task-specific data augmentation technique that enables scalable self-training using web-crawled unlabeled sentences, improving NLP performance without in-domain data.

Findings

01

Up to 2.6% improvement on text classification benchmarks

02

Effective in knowledge-distillation and few-shot learning scenarios

03

Self-training complements strong pre-trained models like RoBERTa

Abstract

Unsupervised pre-training has led to much recent progress in natural language understanding. In this paper, we study self-training as another way to leverage unlabeled data through semi-supervised learning. To obtain additional data for a specific task, we introduce SentAugment, a data augmentation method which computes task-specific query embeddings from labeled data to retrieve sentences from a bank of billions of unlabeled sentences crawled from the web. Unlike previous semi-supervised methods, our approach does not require in-domain unlabeled data and is therefore more generally applicable. Experiments show that self-training is complementary to strong RoBERTa baselines on a variety of tasks. Our augmentation approach leads to scalable and effective self-training with improvements of up to 2.6% on standard text classification benchmarks. Finally, we also show strong gains on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

facebookresearch/SentAugment
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsLinear Layer · Dense Connections · Layer Normalization · WordPiece · Multi-Head Attention · Dropout · Linear Warmup With Linear Decay · Attention Dropout · Weight Decay · Attention Is All You Need