A Review of Evaluation Techniques for Social Dialogue Systems

Amanda Cercas Curry; Helen Hastie; Verena Rieser

arXiv:1709.04409·cs.CL·September 14, 2017

A Review of Evaluation Techniques for Social Dialogue Systems

Amanda Cercas Curry, Helen Hastie, Verena Rieser

PDF

Open Access

TL;DR

This paper reviews current evaluation techniques for social dialogue systems, highlighting their limitations such as ignoring context and lacking grounding in human perceptions, and discusses the challenges in assessing non-goal-oriented conversations.

Contribution

It provides a comprehensive review of existing automatic evaluation methods for social dialogue systems and critically analyzes their shortcomings.

Findings

01

Turn-based metrics often ignore context.

02

End-of-dialogue rewards are mainly hand-crafted.

03

Current metrics lack grounding in human perceptions.

Abstract

In contrast with goal-oriented dialogue, social dialogue has no clear measure of task success. Consequently, evaluation of these systems is notoriously hard. In this paper, we review current evaluation methods, focusing on automatic metrics. We conclude that turn-based metrics often ignore the context and do not account for the fact that several replies are valid, while end-of-dialogue rewards are mainly hand-crafted. Both lack grounding in human perceptions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Multi-Agent Systems and Negotiation · Team Dynamics and Performance