Data Augmentation in Natural Language Processing: A Novel Text Generation Approach for Long and Short Text Classifiers
Markus Bayer, Marc-Andr\'e Kaufhold, Bj\"orn Buchhold, Marcel Keller,, J\"org Dallmeyer, Christian Reuter

TL;DR
This paper introduces a novel text generation method for data augmentation in NLP that significantly improves classifier performance on both long and short texts, especially in low-data scenarios.
Contribution
The paper presents a new text generation approach tailored for NLP data augmentation, demonstrating its effectiveness across multiple datasets and tasks.
Findings
Additive accuracy gains of up to 15.53% in low-data regimes
F1-score improvements of up to +4.84 in real-world tasks
Effective across 11 diverse datasets
Abstract
In many cases of machine learning, research suggests that the development of training data might have a higher relevance than the choice and modelling of classifiers themselves. Thus, data augmentation methods have been developed to improve classifiers by artificially created training data. In NLP, there is the challenge of establishing universal rules for text transformations which provide new linguistic patterns. In this paper, we present and evaluate a text generation method suitable to increase the performance of classifiers for long and short texts. We achieved promising improvements when evaluating short as well as long text tasks with the enhancement by our text generation method. Especially with regard to small data analytics, additive accuracy gains of up to 15.53% and 3.56% are achieved within a constructed low data regime, compared to the no augmentation baseline and another…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
