EDA: Easy Data Augmentation Techniques for Boosting Performance on Text   Classification Tasks

Jason Wei; Kai Zou

arXiv:1901.11196·cs.CL·August 27, 2019·197 cites

EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks

Jason Wei, Kai Zou

PDF

Open Access 5 Repos

TL;DR

This paper introduces EDA, a set of simple data augmentation techniques that significantly improve text classification performance, especially on smaller datasets, by applying synonym replacement, insertion, swapping, and deletion.

Contribution

The paper proposes four easy-to-implement data augmentation methods that enhance text classification accuracy across various models and datasets.

Findings

01

EDA improves performance on five text classification tasks.

02

Using EDA with half the data matches full-data accuracy.

03

Techniques are effective for both CNNs and RNNs.

Abstract

We present EDA: easy data augmentation techniques for boosting performance on text classification tasks. EDA consists of four simple but powerful operations: synonym replacement, random insertion, random swap, and random deletion. On five text classification tasks, we show that EDA improves performance for both convolutional and recurrent neural networks. EDA demonstrates particularly strong results for smaller datasets; on average, across five datasets, training with EDA while using only 50% of the available training set achieved the same accuracy as normal training with all available data. We also performed extensive ablation studies and suggest parameters for practical use.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies