Simulating User Satisfaction for the Evaluation of Task-oriented   Dialogue Systems

Weiwei Sun; Shuo Zhang; Krisztian Balog; Zhaochun Ren; Pengjie Ren,; Zhumin Chen; Maarten de Rijke

arXiv:2105.03748·cs.IR·May 11, 2021

Simulating User Satisfaction for the Evaluation of Task-oriented Dialogue Systems

Weiwei Sun, Shuo Zhang, Krisztian Balog, Zhaochun Ren, Pengjie Ren,, Zhumin Chen, Maarten de Rijke

PDF

1 Repo 1 Datasets

TL;DR

This paper introduces a new dataset and methods for simulating user satisfaction in task-oriented dialogue systems to improve evaluation accuracy and human-likeness.

Contribution

It presents the USS dataset with satisfaction annotations and baseline models, enhancing the evaluation of dialogue systems through user satisfaction prediction.

Findings

01

Distributed representations outperform feature-based methods.

02

Hierarchical GRUs excel in in-domain satisfaction prediction.

03

BERT-based models generalize better across domains.

Abstract

Evaluation is crucial in the development process of task-oriented dialogue systems. As an evaluation method, user simulation allows us to tackle issues such as scalability and cost-efficiency, making it a viable choice for large-scale automatic evaluation. To help build a human-like user simulator that can measure the quality of a dialogue, we propose the following task: simulating user satisfaction for the evaluation of task-oriented dialogue systems. The purpose of the task is to increase the evaluation power of user simulations and to make the simulation more human-like. To overcome a lack of annotated data, we propose a user satisfaction annotation dataset, USS, that includes 6,800 dialogues sampled from multiple domains, spanning real-world e-commerce dialogues, task-oriented dialogues constructed through Wizard-of-Oz experiments, and movie recommendation dialogues. All user…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sunnweiwei/user-satisfaction-simulation
pytorchOfficial

Datasets

akomma/uss-ratings-dataset
dataset· 7 dl
7 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.