An Empirical Comparison of LM-based Question and Answer Generation Methods
Asahi Ushio, Fernando Alva-Manchego, Jose Camacho-Collados

TL;DR
This paper empirically compares three sequence-to-sequence language model-based question-answer generation methods, demonstrating that a lightweight end-to-end model is robust and effective, with generated data aiding QA model training.
Contribution
It provides a comprehensive baseline comparison of LM-based QAG methods and shows the effectiveness of generated data for training QA models.
Findings
End-to-end QAG model outperforms more complex approaches.
Generated question-answer pairs can train competitive QA models.
Performance varies depending on the underlying language model.
Abstract
Question and answer generation (QAG) consists of generating a set of question-answer pairs given a context (e.g. a paragraph). This task has a variety of applications, such as data augmentation for question answering (QA) models, information retrieval and education. In this paper, we establish baselines with three different QAG methodologies that leverage sequence-to-sequence language model (LM) fine-tuning. Experiments show that an end-to-end QAG model, which is computationally light at both training and inference times, is generally robust and outperforms other more convoluted approaches. However, there are differences depending on the underlying generative LM. Finally, our analysis shows that QA models fine-tuned solely on generated question-answer pairs can be competitive when compared to supervised QA models trained on human-labeled data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems
