Conversational QA Dataset Generation with Answer Revision
Seonjeong Hwang, Gary Geunbae Lee

TL;DR
This paper presents a new framework for generating large-scale conversational question-answer datasets by extracting question-worthy phrases, generating questions, and revising answers to improve data quality and domain adaptation.
Contribution
It introduces a novel answer revision method that enhances synthetic data quality and demonstrates effective domain adaptation for conversational QA.
Findings
Answer revision significantly improves data quality
Framework effectively adapts to new domains
Synthetic data enhances conversational QA performance
Abstract
Conversational question--answer generation is a task that automatically generates a large-scale conversational question answering dataset based on input passages. In this paper, we introduce a novel framework that extracts question-worthy phrases from a passage and then generates corresponding questions considering previous conversations. In particular, our framework revises the extracted answers after generating questions so that answers exactly match paired questions. Experimental results show that our simple answer revision approach leads to significant improvement in the quality of synthetic data. Moreover, we prove that our framework can be effectively utilized for domain adaptation of conversational question answering.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems
