A Survey of Data Augmentation Approaches for NLP

Steven Y. Feng; Varun Gangal; Jason Wei; Sarath Chandar; Soroush; Vosoughi; Teruko Mitamura; Eduard Hovy

arXiv:2105.03075·cs.CL·December 3, 2021

A Survey of Data Augmentation Approaches for NLP

Steven Y. Feng, Varun Gangal, Jason Wei, Sarath Chandar, Soroush, Vosoughi, Teruko Mitamura, Eduard Hovy

PDF

1 Repo

TL;DR

This survey comprehensively reviews data augmentation techniques in NLP, highlighting methods, applications, challenges, and future directions to advance research in low-resource and large-scale neural network tasks.

Contribution

It provides a structured summary of existing NLP data augmentation approaches, clarifies the research landscape, and offers a GitHub resource for ongoing updates.

Findings

01

Summarizes major data augmentation methods for NLP

02

Highlights application-specific techniques in NLP tasks

03

Outlines challenges and future research directions

Abstract

Data augmentation has recently seen increased interest in NLP due to more work in low-resource domains, new tasks, and the popularity of large-scale neural networks that require large amounts of training data. Despite this recent upsurge, this area is still relatively underexplored, perhaps due to the challenges posed by the discrete nature of language data. In this paper, we present a comprehensive and unifying survey of data augmentation for NLP by summarizing the literature in a structured manner. We first introduce and motivate data augmentation for NLP, and then discuss major methodologically representative approaches. Next, we highlight techniques that are used for popular NLP applications and tasks. We conclude by outlining current challenges and directions for future research. Overall, our paper aims to clarify the landscape of existing literature in data augmentation for NLP…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jasonwei20/eda_nlp
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.