Prompting-based Synthetic Data Generation for Few-Shot Question Answering
Maximilian Schmidt, Andrea Bartezzaghi, Ngoc Thang Vu

TL;DR
This paper demonstrates that prompting large language models to generate synthetic data can significantly improve few-shot question answering performance across multiple datasets, reducing reliance on costly manual annotation.
Contribution
The study introduces a prompting-based data generation method that leverages language models' domain-agnostic knowledge to enhance few-shot question answering, outperforming existing approaches.
Findings
Prompting large language models improves QA performance in few-shot settings.
Synthetic data generated via prompting outperforms previous methods.
The approach reduces the need for extensive manual annotation.
Abstract
Although language models (LMs) have boosted the performance of Question Answering, they still need plenty of data. Data annotation, in contrast, is a time-consuming process. This especially applies to Question Answering, where possibly large documents have to be parsed and annotated with questions and their corresponding answers. Furthermore, Question Answering models often only work well for the domain they were trained on. Since annotation is costly, we argue that domain-agnostic knowledge from LMs, such as linguistic understanding, is sufficient to create a well-curated dataset. With this motivation, we show that using large language models can improve Question Answering performance on various datasets in the few-shot setting compared to state-of-the-art approaches. For this, we perform data generation leveraging the Prompting framework, suggesting that language models contain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Seismology and Earthquake Studies · Speech and dialogue systems
