Bipartite-play Dialogue Collection for Practical Automatic Evaluation of   Dialogue Systems

Shiki Sato; Yosuke Kishinami; Hiroaki Sugiyama; Reina Akama; Ryoko; Tokuhisa; Jun Suzuki

arXiv:2211.10596·cs.CL·November 22, 2022·1 cites

Bipartite-play Dialogue Collection for Practical Automatic Evaluation of Dialogue Systems

Shiki Sato, Yosuke Kishinami, Hiroaki Sugiyama, Reina Akama, Ryoko, Tokuhisa, Jun Suzuki

PDF

Open Access

TL;DR

This paper presents bipartite-play, a novel dialogue collection method that improves automatic evaluation of dialogue systems by enabling fair comparison and reducing cheating, with results comparable to human judgment.

Contribution

The paper introduces bipartite-play, a new dialogue collection approach that overcomes limitations of existing methods in automatic dialogue system evaluation.

Findings

01

Mitigates comparison limitations with non-public systems

02

Reduces vulnerability to cheating in evaluations

03

Shows strong correlation with human subjective judgments

Abstract

Automation of dialogue system evaluation is a driving force for the efficient development of dialogue systems. This paper introduces the bipartite-play method, a dialogue collection method for automating dialogue system evaluation. It addresses the limitations of existing dialogue collection methods: (i) inability to compare with systems that are not publicly available, and (ii) vulnerability to cheating by intentionally selecting systems to be compared. Experimental results show that the automatic evaluation using the bipartite-play method mitigates these two drawbacks and correlates as strongly with human subjectivity as existing methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Topic Modeling · AI in Service Interactions