EPiDA: An Easy Plug-in Data Augmentation Framework for High Performance Text Classification
Minyi Zhao, Lu Zhang, Yi Xu, Jiandong Ding, Jihong Guan, Shuigeng Zhou

TL;DR
EPiDA is a plug-in data augmentation framework for text classification that balances diversity and semantic consistency using entropy-based mechanisms, leading to improved performance without relying on pre-trained models.
Contribution
EPiDA introduces a novel entropy-based approach with REM and CEM mechanisms to enhance data augmentation effectiveness in NLP tasks.
Findings
EPiDA outperforms state-of-the-art methods in most cases.
It works well with various DA algorithms and classifiers.
It does not require pre-trained generation networks.
Abstract
Recent works have empirically shown the effectiveness of data augmentation (DA) in NLP tasks, especially for those suffering from data scarcity. Intuitively, given the size of generated data, their diversity and quality are crucial to the performance of targeted tasks. However, to the best of our knowledge, most existing methods consider only either the diversity or the quality of augmented data, thus cannot fully mine the potential of DA for NLP. In this paper, we present an easy and plug-in data augmentation framework EPiDA to support effective text classification. EPiDA employs two mechanisms: relative entropy maximization (REM) and conditional entropy minimization (CEM) to control data generation, where REM is designed to enhance the diversity of augmented data while CEM is exploited to ensure their semantic consistency. EPiDA can support efficient and continuous data generation for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Text and Document Classification Technologies · Machine Learning and Data Classification
MethodsConvolution · Dense Connections · Q-Learning · Deep Q-Network · Random Ensemble Mixture
