Improving Automated Evaluation of Open Domain Dialog via Diverse   Reference Augmentation

Varun Gangal; Harsh Jhamtani; Eduard Hovy; Taylor Berg-Kirkpatrick

arXiv:2106.02833·cs.CL·June 8, 2021

Improving Automated Evaluation of Open Domain Dialog via Diverse Reference Augmentation

Varun Gangal, Harsh Jhamtani, Eduard Hovy, Taylor Berg-Kirkpatrick

PDF

1 Repo

TL;DR

This paper introduces a novel automatic reference augmentation method for open domain dialog evaluation, improving the correlation between automated metrics and human judgments by leveraging knowledge bases and dialog corpora.

Contribution

It proposes an automatic technique to expand reference responses using knowledge sources and retrieval, reducing reliance on costly human annotations.

Findings

01

Enhanced correlation of automated metrics with human ratings.

02

Automatic reference augmentation improves evaluation robustness.

03

Method is effective on the DailyDialog dataset.

Abstract

Multiple different responses are often plausible for a given open domain dialog context. Prior work has shown the importance of having multiple valid reference responses for meaningful and robust automated evaluations. In such cases, common practice has been to collect more human written references. However, such collection can be expensive, time consuming, and not easily scalable. Instead, we propose a novel technique for automatically expanding a human generated reference to a set of candidate references. We fetch plausible references from knowledge sources, and adapt them so that they are more fluent in context of the dialog instance in question. More specifically, we use (1) a commonsense knowledge base to elicit a large number of plausible reactions given the dialog history (2) relevant instances retrieved from dialog corpus, using similar past as well as future contexts. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

harsh19/Diverse-Reference-Augmentation
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.