On Evaluating and Comparing Open Domain Dialog Systems
Anu Venkatesh, Chandra Khatri, Ashwin Ram, Fenfei Guo, Raefer Gabriel,, Ashish Nagar, Rohit Prasad, Ming Cheng, Behnam Hedayatnia, Angeliki, Metallinou, Rahul Goel, Shaohua Yang, Anirudh Raju

TL;DR
This paper presents a comprehensive, multi-metric evaluation strategy for open domain dialog systems, aiming to reduce subjectivity and better correlate with human judgments, based on data from the Alexa Prize competition.
Contribution
It introduces a novel multi-metric evaluation framework that provides granular analysis and a unified scoring mechanism for conversational agents, leveraging large-scale real-world data.
Findings
Metrics correlate well with human judgment
Proposed evaluation reduces subjectivity in assessments
Framework applied successfully in Alexa Prize competition
Abstract
Conversational agents are exploding in popularity. However, much work remains in the area of non goal-oriented conversations, despite significant growth in research interest over recent years. To advance the state of the art in conversational AI, Amazon launched the Alexa Prize, a 2.5-million dollar university competition where sixteen selected university teams built conversational agents to deliver the best social conversational experience. Alexa Prize provided the academic community with the unique opportunity to perform research with a live system used by millions of users. The subjectivity associated with evaluating conversations is key element underlying the challenge of building non-goal oriented dialogue systems. In this paper, we propose a comprehensive evaluation strategy with multiple metrics designed to reduce subjectivity by selecting metrics which correlate well with human…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · AI in Service Interactions · Speech and dialogue systems
