Approximating Interactive Human Evaluation with Self-Play for Open-Domain Dialog Systems
Asma Ghandeharioun, Judy Hanwen Shen, Natasha Jaques, Craig Ferguson,, Noah Jones, Agata Lapedriza, Rosalind Picard

TL;DR
This paper introduces a novel self-play based evaluation method for open-domain dialog systems that better correlates with human judgment than existing automated metrics.
Contribution
It proposes a model-agnostic, dataset-agnostic self-play evaluation approach that approximates interactive human evaluation, improving the assessment of dialog quality.
Findings
Self-play metric correlates with human ratings (r>.7, p<.05).
The method outperforms existing automated metrics.
Open-sourced evaluation platform and dataset.
Abstract
Building an open-domain conversational agent is a challenging problem. Current evaluation methods, mostly post-hoc judgments of static conversation, do not capture conversation quality in a realistic interactive context. In this paper, we investigate interactive human evaluation and provide evidence for its necessity; we then introduce a novel, model-agnostic, and dataset-agnostic method to approximate it. In particular, we propose a self-play scenario where the dialog system talks to itself and we calculate a combination of proxies such as sentiment and semantic coherence on the conversation trajectory. We show that this metric is capable of capturing the human-rated quality of a dialog model better than any automated metric known to-date, achieving a significant Pearson correlation (r>.7, p<.05). To investigate the strengths of this novel metric and interactive evaluation in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems · Multimodal Machine Learning Applications
MethodsKnowledge Distillation
