EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks
Jason Wei, Kai Zou

TL;DR
This paper introduces EDA, a set of simple data augmentation techniques that significantly improve text classification performance, especially on smaller datasets, by applying synonym replacement, insertion, swapping, and deletion.
Contribution
The paper proposes four easy-to-implement data augmentation methods that enhance text classification accuracy across various models and datasets.
Findings
EDA improves performance on five text classification tasks.
Using EDA with half the data matches full-data accuracy.
Techniques are effective for both CNNs and RNNs.
Abstract
We present EDA: easy data augmentation techniques for boosting performance on text classification tasks. EDA consists of four simple but powerful operations: synonym replacement, random insertion, random swap, and random deletion. On five text classification tasks, we show that EDA improves performance for both convolutional and recurrent neural networks. EDA demonstrates particularly strong results for smaller datasets; on average, across five datasets, training with EDA while using only 50% of the available training set achieved the same accuracy as normal training with all available data. We also performed extensive ablation studies and suggest parameters for practical use.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies
