Fantastic Rewards and How to Tame Them: A Case Study on Reward Learning   for Task-oriented Dialogue Systems

Yihao Feng; Shentao Yang; Shujian Zhang; Jianguo Zhang; Caiming Xiong,; Mingyuan Zhou; Huan Wang

arXiv:2302.10342·cs.CL·February 22, 2023·6 cites

Fantastic Rewards and How to Tame Them: A Case Study on Reward Learning for Task-oriented Dialogue Systems

Yihao Feng, Shentao Yang, Shujian Zhang, Jianguo Zhang, Caiming Xiong,, Mingyuan Zhou, Huan Wang

PDF

Open Access 2 Repos

TL;DR

This paper explores how to efficiently learn and utilize reward functions in training end-to-end task-oriented dialogue agents using reinforcement learning, introducing generalized objectives inspired by learning-to-rank methods, and demonstrating competitive results on Multiwoz 2.0.

Contribution

It proposes novel reward-function learning objectives for dialogue systems and shows how to leverage them to improve training of end-to-end dialogue agents.

Findings

01

Achieves competitive performance on Multiwoz 2.0 dataset.

02

Introduces generalized reward learning objectives inspired by learning-to-rank.

03

Provides publicly available source code and checkpoints.

Abstract

When learning task-oriented dialogue (ToD) agents, reinforcement learning (RL) techniques can naturally be utilized to train dialogue strategies to achieve user-specific goals. Prior works mainly focus on adopting advanced RL techniques to train the ToD agents, while the design of the reward function is not well studied. This paper aims at answering the question of how to efficiently learn and leverage a reward function for training end-to-end (E2E) ToD agents. Specifically, we introduce two generalized objectives for reward-function learning, inspired by the classical learning-to-rank literature. Further, we utilize the learned reward function to guide the training of the E2E ToD agent. With the proposed techniques, we achieve competitive results on the E2E response-generation task on the Multiwoz 2.0 dataset. Source code and checkpoints are publicly released at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech and dialogue systems · Intelligent Tutoring Systems and Adaptive Learning