Bipartite-play Dialogue Collection for Practical Automatic Evaluation of Dialogue Systems
Shiki Sato, Yosuke Kishinami, Hiroaki Sugiyama, Reina Akama, Ryoko, Tokuhisa, Jun Suzuki

TL;DR
This paper presents bipartite-play, a novel dialogue collection method that improves automatic evaluation of dialogue systems by enabling fair comparison and reducing cheating, with results comparable to human judgment.
Contribution
The paper introduces bipartite-play, a new dialogue collection approach that overcomes limitations of existing methods in automatic dialogue system evaluation.
Findings
Mitigates comparison limitations with non-public systems
Reduces vulnerability to cheating in evaluations
Shows strong correlation with human subjective judgments
Abstract
Automation of dialogue system evaluation is a driving force for the efficient development of dialogue systems. This paper introduces the bipartite-play method, a dialogue collection method for automating dialogue system evaluation. It addresses the limitations of existing dialogue collection methods: (i) inability to compare with systems that are not publicly available, and (ii) vulnerability to cheating by intentionally selecting systems to be compared. Experimental results show that the automatic evaluation using the bipartite-play method mitigates these two drawbacks and correlates as strongly with human subjectivity as existing methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Topic Modeling · AI in Service Interactions
