KGConv, a Conversational Corpus grounded in Wikidata

Quentin Brabant; Gwenole Lecorve; Lina M. Rojas-Barahona; Claire; Gardent

arXiv:2308.15298·cs.CL·August 30, 2023

KGConv, a Conversational Corpus grounded in Wikidata

Quentin Brabant, Gwenole Lecorve, Lina M. Rojas-Barahona, Claire, Gardent

PDF

Open Access 1 Datasets

TL;DR

KGConv is a large conversational dataset grounded in Wikidata, enabling research in knowledge-based question generation, rewriting, and answering through diverse, annotated question variants.

Contribution

The paper introduces KGConv, a novel large-scale conversational corpus grounded in Wikidata with multiple question variants and baseline tasks for knowledge-based conversational AI.

Findings

01

KGConv contains 71k conversations with 8.6 questions each.

02

Multiple question variants per Wikidata fact enhance diversity.

03

Baseline models demonstrate the dataset's utility for various tasks.

Abstract

We present KGConv, a large, conversational corpus of 71k conversations where each question-answer pair is grounded in a Wikidata fact. Conversations contain on average 8.6 questions and for each Wikidata fact, we provide multiple variants (12 on average) of the corresponding question using templates, human annotations, hand-crafted rules and a question rewriting neural model. We provide baselines for the task of Knowledge-Based, Conversational Question Generation. KGConv can further be used for other generation and analysis tasks such as single-turn question generation from Wikidata triples, question rewriting, question answering from conversation or from knowledge graphs and quiz generation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Orange/KGConv
dataset· 310 dl
310 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems