Adversarial Evaluation of Dialogue Models

Anjuli Kannan; Oriol Vinyals

arXiv:1701.08198·cs.CL·January 31, 2017·66 cites

Adversarial Evaluation of Dialogue Models

Anjuli Kannan, Oriol Vinyals

PDF

Open Access

TL;DR

This paper explores using adversarial training to evaluate dialogue models by training an RNN to distinguish between machine-generated and human responses, aiming to improve automatic evaluation methods.

Contribution

It investigates the feasibility of adversarial evaluation for dialogue systems and discusses challenges and potential directions for future research.

Findings

01

Adversarial evaluation shows some promise but faces practical challenges.

02

The RNN discriminator can partially distinguish machine from human responses.

03

Further research is needed to refine adversarial evaluation methods.

Abstract

The recent application of RNN encoder-decoder models has resulted in substantial progress in fully data-driven dialogue systems, but evaluation remains a challenge. An adversarial loss could be a way to directly evaluate the extent to which generated dialogue responses sound like they came from a human. This could reduce the need for human evaluation, while more directly evaluating on a generative task. In this work, we investigate this idea by training an RNN to discriminate a dialogue model's samples from human-generated samples. Although we find some evidence this setup could be viable, we also note that many issues remain in its practical application. We discuss both aspects and conclude that future work is warranted.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Adversarial Robustness in Machine Learning · Multimodal Machine Learning Applications