Conversational QA Dataset Generation with Answer Revision

Seonjeong Hwang; Gary Geunbae Lee

arXiv:2209.11396·cs.CL·September 26, 2022

Conversational QA Dataset Generation with Answer Revision

Seonjeong Hwang, Gary Geunbae Lee

PDF

Open Access

TL;DR

This paper presents a new framework for generating large-scale conversational question-answer datasets by extracting question-worthy phrases, generating questions, and revising answers to improve data quality and domain adaptation.

Contribution

It introduces a novel answer revision method that enhances synthetic data quality and demonstrates effective domain adaptation for conversational QA.

Findings

01

Answer revision significantly improves data quality

02

Framework effectively adapts to new domains

03

Synthetic data enhances conversational QA performance

Abstract

Conversational question--answer generation is a task that automatically generates a large-scale conversational question answering dataset based on input passages. In this paper, we introduce a novel framework that extracts question-worthy phrases from a passage and then generates corresponding questions considering previous conversations. In particular, our framework revises the extracted answers after generating questions so that answers exactly match paired questions. Experimental results show that our simple answer revision approach leads to significant improvement in the quality of synthetic data. Moreover, we prove that our framework can be effectively utilized for domain adaptation of conversational question answering.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems