Data Augmentation for Emotion Detection in Small Imbalanced Text Data
Anna Koufakou, Diego Grisales, Ragy Costa de jesus, Oscar Fox

TL;DR
This paper investigates how data augmentation techniques can improve emotion detection in small, imbalanced text datasets, demonstrating significant performance gains and exploring novel augmentation methods like GPT paraphrasing.
Contribution
It introduces and evaluates multiple data augmentation strategies specifically tailored for small, imbalanced emotion recognition datasets, including novel use of GPT-based paraphrasing.
Findings
Augmentation significantly improves model performance.
GPT-based paraphrasing shows promising results.
External data augmentation enhances emotion detection accuracy.
Abstract
Emotion recognition in text, the task of identifying emotions such as joy or anger, is a challenging problem in NLP with many applications. One of the challenges is the shortage of available datasets that have been annotated with emotions. Certain existing datasets are small, follow different emotion taxonomies and display imbalance in their emotion distribution. In this work, we studied the impact of data augmentation techniques precisely when applied to small imbalanced datasets, for which current state-of-the-art models (such as RoBERTa) under-perform. Specifically, we utilized four data augmentation methods (Easy Data Augmentation EDA, static and contextual Embedding-based, and ProtAugment) on three datasets that come from different sources and vary in size, emotion categories and distributions. Our experimental results show that using the augmented data when training the classifier…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining · Text and Document Classification Technologies · Topic Modeling
