Leveraging QA Datasets to Improve Generative Data Augmentation
Dheeraj Mekala, Tu Vu, Timo Schick, Jingbo Shang

TL;DR
This paper introduces CONDA, a novel method that leverages question-answer datasets to enhance generative data augmentation by reformulating data as context generation, leading to significant performance improvements in classification tasks.
Contribution
The paper proposes a new approach using QA datasets for training context generators, improving generative data augmentation for downstream tasks in few- and zero-shot scenarios.
Findings
QA datasets with high-level reasoning improve augmentation effectiveness
Substantial performance gains in classification tasks
Effective in both few-shot and zero-shot settings
Abstract
The ability of generative language models (GLMs) to generate text has improved considerably in the last few years, enabling their use for generative data augmentation. In this work, we propose CONDA, an approach to further improve GLMs' ability to generate synthetic data by reformulating data generation as context generation for a given question-answer (QA) pair and leveraging QA datasets for training context generators. Then, we cast downstream tasks into the same question answering format and adapt the fine-tuned context generators to the target task domain. Finally, we use the fine-tuned GLM to generate relevant contexts, which are in turn used as synthetic training data for their corresponding tasks. We perform extensive experiments on multiple classification datasets and demonstrate substantial improvements in performance for both few- and zero-shot settings. Our analysis reveals…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems
