Data Augmentation using Pre-trained Transformer Models

Varun Kumar; Ashutosh Choudhary; Eunah Cho

arXiv:2003.02245·cs.CL·February 2, 2021·141 cites

Data Augmentation using Pre-trained Transformer Models

Varun Kumar, Ashutosh Choudhary, Eunah Cho

PDF

Open Access 4 Repos

TL;DR

This paper investigates how different transformer-based pre-trained models can be used for data augmentation in NLP, demonstrating that class label prepending and Seq2Seq models improve low-resource classification performance.

Contribution

It introduces a simple method of conditioning pre-trained models with class labels for data augmentation and compares various transformer architectures across multiple benchmarks.

Findings

01

Seq2Seq models outperform other methods in low-resource settings.

02

Prepending class labels effectively conditions models for augmentation.

03

Data augmentation with these methods increases data diversity and preserves label information.

Abstract

Language model based pre-trained models such as BERT have provided significant gains across different NLP tasks. In this paper, we study different types of transformer based pre-trained models such as auto-regressive models (GPT-2), auto-encoder models (BERT), and seq2seq models (BART) for conditional data augmentation. We show that prepending the class labels to text sequences provides a simple yet effective way to condition the pre-trained models for data augmentation. Additionally, on three classification benchmarks, pre-trained Seq2Seq model outperforms other data augmentation methods in a low-resource setting. Further, we explore how different pre-trained model based data augmentation differs in-terms of data diversity, and how well such methods preserve the class-label information.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning in Healthcare

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Sigmoid Activation · Tanh Activation · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Byte Pair Encoding