End-to-End Synthetic Data Generation for Domain Adaptation of Question Answering Systems
Siamak Shakeri, Cicero Nogueira dos Santos, Henry Zhu, Patrick Ng,, Feng Nan, Zhiguo Wang, Ramesh Nallapati, Bing Xiang

TL;DR
This paper introduces an end-to-end transformer-based method for generating synthetic question-answer pairs to improve domain adaptation in QA systems, eliminating the need for separate filtering models and achieving state-of-the-art results.
Contribution
The authors develop a unified transformer model trained end-to-end for QA data generation, streamlining the process and enhancing domain adaptation performance.
Findings
Significant improvements over existing methods in domain adaptation of QA models.
The end-to-end generator effectively filters generated data without separate models.
Achieves state-of-the-art results on benchmark datasets.
Abstract
We propose an end-to-end approach for synthetic QA data generation. Our model comprises a single transformer-based encoder-decoder network that is trained end-to-end to generate both answers and questions. In a nutshell, we feed a passage to the encoder and ask the decoder to generate a question and an answer token-by-token. The likelihood produced in the generation process is used as a filtering score, which avoids the need for a separate filtering model. Our generator is trained by fine-tuning a pretrained LM using maximum likelihood estimation. The experimental results indicate significant improvements in the domain adaptation of QA models outperforming current state-of-the-art methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
