ComperDial: Commonsense Persona-grounded Dialogue Dataset and Benchmark
Hiromi Wakaki, Yuki Mitsufuji, Yoshinori Maeda, Yukiko Nishimura,, Silin Gao, Mengjie Zhao, Keiichi Yamada, and Antoine Bosselut

TL;DR
ComperDial is a new benchmark dataset with human-annotated responses for evaluating open-domain dialogue systems, and introduces CPDScore, a novel automatic metric that better correlates with human judgments.
Contribution
The paper introduces ComperDial, a comprehensive dialogue dataset with diverse responses and human scores, and proposes CPDScore, an improved automatic evaluation metric for dialogue quality.
Findings
CPDScore correlates more strongly with human judgments than existing metrics.
ComperDial includes over 10,000 dialogue turns with diverse, human-annotated responses.
The benchmark enables robust evaluation of dialogue systems at both turn and dialogue levels.
Abstract
We propose a new benchmark, ComperDial, which facilitates the training and evaluation of evaluation metrics for open-domain dialogue systems. ComperDial consists of human-scored responses for 10,395 dialogue turns in 1,485 conversations collected from 99 dialogue agents submitted to the Commonsense Persona-grounded Dialogue (CPD) challenge. As a result, for any dialogue, our benchmark includes multiple diverse responses with variety of characteristics to ensure more robust evaluation of learned dialogue metrics. In addition to single-turn response scores, ComperDial also contains dialogue-level human-annotated scores, enabling joint assessment of multi-turn model responses throughout a dialogue. Finally, building off ComperDial, we devise a new automatic evaluation metric to measure the general similarity of model-generated dialogues to human conversations. Our experimental results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPersona Design and Applications
