Approximating Interactive Human Evaluation with Self-Play for   Open-Domain Dialog Systems

Asma Ghandeharioun; Judy Hanwen Shen; Natasha Jaques; Craig Ferguson,; Noah Jones; Agata Lapedriza; Rosalind Picard

arXiv:1906.09308·cs.CL·November 5, 2019·51 cites

Approximating Interactive Human Evaluation with Self-Play for Open-Domain Dialog Systems

Asma Ghandeharioun, Judy Hanwen Shen, Natasha Jaques, Craig Ferguson,, Noah Jones, Agata Lapedriza, Rosalind Picard

PDF

Open Access 2 Repos

TL;DR

This paper introduces a novel self-play based evaluation method for open-domain dialog systems that better correlates with human judgment than existing automated metrics.

Contribution

It proposes a model-agnostic, dataset-agnostic self-play evaluation approach that approximates interactive human evaluation, improving the assessment of dialog quality.

Findings

01

Self-play metric correlates with human ratings (r>.7, p<.05).

02

The method outperforms existing automated metrics.

03

Open-sourced evaluation platform and dataset.

Abstract

Building an open-domain conversational agent is a challenging problem. Current evaluation methods, mostly post-hoc judgments of static conversation, do not capture conversation quality in a realistic interactive context. In this paper, we investigate interactive human evaluation and provide evidence for its necessity; we then introduce a novel, model-agnostic, and dataset-agnostic method to approximate it. In particular, we propose a self-play scenario where the dialog system talks to itself and we calculate a combination of proxies such as sentiment and semantic coherence on the conversation trajectory. We show that this metric is capable of capturing the human-rated quality of a dialog model better than any automated metric known to-date, achieving a significant Pearson correlation (r>.7, p<.05). To investigate the strengths of this novel metric and interactive evaluation in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech and dialogue systems · Multimodal Machine Learning Applications

MethodsKnowledge Distillation