AEDA: An Easier Data Augmentation Technique for Text Classification

Akbar Karimi; Leonardo Rossi; Andrea Prati

arXiv:2108.13230·cs.CL·August 31, 2021

AEDA: An Easier Data Augmentation Technique for Text Classification

Akbar Karimi, Leonardo Rossi, Andrea Prati

PDF

Open Access 2 Repos

TL;DR

This paper introduces AEDA, a simple data augmentation method for text classification that involves random punctuation insertion, which outperforms the more complex EDA method across multiple datasets by preserving input information.

Contribution

AEDA offers a straightforward, effective data augmentation technique that improves text classification performance by maintaining word order and input integrity.

Findings

01

AEDA outperforms EDA on five datasets.

02

AEDA preserves input information better than EDA.

03

AEDA is easier to implement than EDA.

Abstract

This paper proposes AEDA (An Easier Data Augmentation) technique to help improve the performance on text classification tasks. AEDA includes only random insertion of punctuation marks into the original text. This is an easier technique to implement for data augmentation than EDA method (Wei and Zou, 2019) with which we compare our results. In addition, it keeps the order of the words while changing their positions in the sentence leading to a better generalized performance. Furthermore, the deletion operation in EDA can cause loss of information which, in turn, misleads the network, whereas AEDA preserves all the input information. Following the baseline, we perform experiments on five different datasets for text classification. We show that using the AEDA-augmented data for training, the models show superior performance compared to using the EDA-augmented data in all five datasets. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies

MethodsAn Easier Data Augmentation