AugGPT: Leveraging ChatGPT for Text Data Augmentation
Haixing Dai, Zhengliang Liu, Wenxiong Liao, Xiaoke Huang, Yihan Cao,, Zihao Wu, Lin Zhao, Shaochen Xu, Wei Liu, Ninghao Liu, Sheng Li, Dajiang Zhu,, Hongmin Cai, Lichao Sun, Quanzheng Li, Dinggang Shen, Tianming Liu, and Xiang, Li

TL;DR
AugGPT leverages ChatGPT to generate diverse, faithful, and semantically varied text samples for data augmentation, significantly improving few-shot text classification performance.
Contribution
This paper introduces AugGPT, a novel method using ChatGPT for effective text data augmentation that enhances diversity and faithfulness in few-shot learning scenarios.
Findings
AugGPT outperforms existing augmentation methods in accuracy.
Augmented samples show increased diversity and semantic variation.
Improved model robustness in low-data regimes.
Abstract
Text data augmentation is an effective strategy for overcoming the challenge of limited sample sizes in many natural language processing (NLP) tasks. This challenge is especially prominent in the few-shot learning scenario, where the data in the target domain is generally much scarcer and of lowered quality. A natural and widely-used strategy to mitigate such challenges is to perform data augmentation to better capture the data invariance and increase the sample size. However, current text data augmentation methods either can't ensure the correct labeling of the generated data (lacking faithfulness) or can't ensure sufficient diversity in the generated data (lacking compactness), or both. Inspired by the recent success of large language models, especially the development of ChatGPT, which demonstrated improved language comprehension abilities, in this work, we propose a text data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · COVID-19 diagnosis using AI
