
TL;DR
This paper introduces GE3, a lightweight data augmentation method that extrapolates hidden space distributions between classes, improving text classification performance especially in imbalanced data scenarios.
Contribution
The paper proposes a novel, hyperparameter-free data augmentation technique called GE3 that leverages hidden space extrapolation for better text classification.
Findings
GE3 outperforms upsampling and other augmentation methods.
GE3 is effective across multiple datasets and imbalance scenarios.
The method is simple, lightweight, and hyperparameter-free.
Abstract
This paper asks whether extrapolating the hidden space distribution of text examples from one class onto another is a valid inductive bias for data augmentation. To operationalize this question, I propose a simple data augmentation protocol called "good-enough example extrapolation" (GE3). GE3 is lightweight and has no hyperparameters. Applied to three text classification datasets for various data imbalance scenarios, GE3 improves performance more than upsampling and other hidden-space data augmentation methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Natural Language Processing Techniques · Imbalanced Data Classification Techniques
