Data Boost: Text Data Augmentation Through Reinforcement Learning Guided   Conditional Generation

Ruibo Liu; Guangxuan Xu; Chenyan Jia; Weicheng Ma; Lili Wang; Soroush; Vosoughi

arXiv:2012.02952·cs.CL·December 8, 2020

Data Boost: Text Data Augmentation Through Reinforcement Learning Guided Conditional Generation

Ruibo Liu, Guangxuan Xu, Chenyan Jia, Weicheng Ma, Lili Wang, Soroush, Vosoughi

PDF

TL;DR

Data Boost is a reinforcement learning-based text data augmentation framework that improves classifier performance in low-resource scenarios by generating high-quality, class-consistent augmented data.

Contribution

We introduce Data Boost, a novel reinforcement learning guided conditional generation method for effective text data augmentation in NLP tasks.

Findings

01

Data Boost improves F1 scores by 8.7% on average with only 10% training data.

02

It outperforms six prior augmentation methods in diverse classification tasks.

03

Human evaluation confirms high quality and class consistency of augmented data.

Abstract

Data augmentation is proven to be effective in many NLU tasks, especially for those suffering from data scarcity. In this paper, we present a powerful and easy to deploy text augmentation framework, Data Boost, which augments data through reinforcement learning guided conditional generation. We evaluate Data Boost on three diverse text classification tasks under five different classifier architectures. The result shows that Data Boost can boost the performance of classifiers especially in low-resource data scenarios. For instance, Data Boost improves F1 for the three tasks by 8.7% on average when given only 10% of the whole data for training. We also compare Data Boost with six prior text augmentation methods. Through human evaluations (N=178), we confirm that Data Boost augmentation has comparable quality as the original data with respect to readability and class consistency.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.