KGConv, a Conversational Corpus grounded in Wikidata
Quentin Brabant, Gwenole Lecorve, Lina M. Rojas-Barahona, Claire, Gardent

TL;DR
KGConv is a large conversational dataset grounded in Wikidata, enabling research in knowledge-based question generation, rewriting, and answering through diverse, annotated question variants.
Contribution
The paper introduces KGConv, a novel large-scale conversational corpus grounded in Wikidata with multiple question variants and baseline tasks for knowledge-based conversational AI.
Findings
KGConv contains 71k conversations with 8.6 questions each.
Multiple question variants per Wikidata fact enhance diversity.
Baseline models demonstrate the dataset's utility for various tasks.
Abstract
We present KGConv, a large, conversational corpus of 71k conversations where each question-answer pair is grounded in a Wikidata fact. Conversations contain on average 8.6 questions and for each Wikidata fact, we provide multiple variants (12 on average) of the corresponding question using templates, human annotations, hand-crafted rules and a question rewriting neural model. We provide baselines for the task of Knowledge-Based, Conversational Question Generation. KGConv can further be used for other generation and analysis tasks such as single-turn question generation from Wikidata triples, question rewriting, question answering from conversation or from knowledge graphs and quiz generation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems
