PromDA: Prompt-based Data Augmentation for Low-Resource NLU Tasks
Yufei Wang, Can Xu, Qingfeng Sun, Huang Hu, Chongyang Tao, Xiubo Geng,, Daxin Jiang

TL;DR
PromDA introduces a prompt-based data augmentation method that enhances low-resource NLU tasks by generating high-quality synthetic data without requiring extensive human effort or unlabeled data, leading to improved model performance.
Contribution
The paper presents a novel prompt-based data augmentation approach using soft prompts within frozen PLMs, avoiding human data collection and effectively boosting low-resource NLU performance.
Findings
PromDA outperforms several baseline models on four benchmark datasets.
Synthetic data from PromDA complements unlabeled in-domain data.
Combining PromDA with unlabeled data further improves NLU model accuracy.
Abstract
This paper focuses on the Data Augmentation for low-resource Natural Language Understanding (NLU) tasks. We propose Prompt-based D}ata Augmentation model (PromDA) which only trains small-scale Soft Prompt (i.e., a set of trainable vectors) in the frozen Pre-trained Language Models (PLMs). This avoids human effort in collecting unlabeled in-domain data and maintains the quality of generated synthetic data. In addition, PromDA generates synthetic data via two different views and filters out the low-quality data using NLU models. Experiments on four benchmarks show that synthetic data produced by PromDA successfully boost up the performance of NLU models which consistently outperform several competitive baseline models, including a state-of-the-art semi-supervised model using unlabeled in-domain data. The synthetic data from PromDA are also complementary with unlabeled in-domain data. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
