PromDA: Prompt-based Data Augmentation for Low-Resource NLU Tasks

Yufei Wang; Can Xu; Qingfeng Sun; Huang Hu; Chongyang Tao; Xiubo Geng,; Daxin Jiang

arXiv:2202.12499·cs.CL·March 18, 2022·1 cites

PromDA: Prompt-based Data Augmentation for Low-Resource NLU Tasks

Yufei Wang, Can Xu, Qingfeng Sun, Huang Hu, Chongyang Tao, Xiubo Geng,, Daxin Jiang

PDF

Open Access 1 Repo

TL;DR

PromDA introduces a prompt-based data augmentation method that enhances low-resource NLU tasks by generating high-quality synthetic data without requiring extensive human effort or unlabeled data, leading to improved model performance.

Contribution

The paper presents a novel prompt-based data augmentation approach using soft prompts within frozen PLMs, avoiding human data collection and effectively boosting low-resource NLU performance.

Findings

01

PromDA outperforms several baseline models on four benchmark datasets.

02

Synthetic data from PromDA complements unlabeled in-domain data.

03

Combining PromDA with unlabeled data further improves NLU model accuracy.

Abstract

This paper focuses on the Data Augmentation for low-resource Natural Language Understanding (NLU) tasks. We propose Prompt-based D}ata Augmentation model (PromDA) which only trains small-scale Soft Prompt (i.e., a set of trainable vectors) in the frozen Pre-trained Language Models (PLMs). This avoids human effort in collecting unlabeled in-domain data and maintains the quality of generated synthetic data. In addition, PromDA generates synthetic data via two different views and filters out the low-quality data using NLU models. Experiments on four benchmarks show that synthetic data produced by PromDA successfully boost up the performance of NLU models which consistently outperform several competitive baseline models, including a state-of-the-art semi-supervised model using unlabeled in-domain data. The synthetic data from PromDA are also complementary with unlabeled in-domain data. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

garyyufei/promda
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications