A Review of Evaluation Techniques for Social Dialogue Systems
Amanda Cercas Curry, Helen Hastie, Verena Rieser

TL;DR
This paper reviews current evaluation techniques for social dialogue systems, highlighting their limitations such as ignoring context and lacking grounding in human perceptions, and discusses the challenges in assessing non-goal-oriented conversations.
Contribution
It provides a comprehensive review of existing automatic evaluation methods for social dialogue systems and critically analyzes their shortcomings.
Findings
Turn-based metrics often ignore context.
End-of-dialogue rewards are mainly hand-crafted.
Current metrics lack grounding in human perceptions.
Abstract
In contrast with goal-oriented dialogue, social dialogue has no clear measure of task success. Consequently, evaluation of these systems is notoriously hard. In this paper, we review current evaluation methods, focusing on automatic metrics. We conclude that turn-based metrics often ignore the context and do not account for the fact that several replies are valid, while end-of-dialogue rewards are mainly hand-crafted. Both lack grounding in human perceptions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Multi-Agent Systems and Negotiation · Team Dynamics and Performance
