Rewarding What Matters: Step-by-Step Reinforcement Learning for   Task-Oriented Dialogue

Huifang Du; Shuqin Li; Minghao Wu; Xuejing Feng; Yuan-Fang Li; Haofen; Wang

arXiv:2406.14457·cs.AI·June 21, 2024

Rewarding What Matters: Step-by-Step Reinforcement Learning for Task-Oriented Dialogue

Huifang Du, Shuqin Li, Minghao Wu, Xuejing Feng, Yuan-Fang Li, Haofen, Wang

PDF

Open Access

TL;DR

This paper introduces a step-by-step reinforcement learning method that jointly optimizes understanding and generation in task-oriented dialogue systems, leading to improved performance and state-of-the-art results.

Contribution

It extends reinforcement learning to both dialogue understanding and generation tasks with token-level rewards, addressing sparse reward issues and improving overall system performance.

Findings

01

Achieves state-of-the-art results on MultiWOZ2.0, MultiWOZ2.1, and In-Car datasets.

02

Demonstrates superior few-shot learning capabilities in low-resource scenarios.

03

Effectively balances understanding and generation through step-by-step rewards.

Abstract

Reinforcement learning (RL) is a powerful approach to enhance task-oriented dialogue (TOD) systems. However, existing RL methods tend to mainly focus on generation tasks, such as dialogue policy learning (DPL) or response generation (RG), while neglecting dialogue state tracking (DST) for understanding. This narrow focus limits the systems to achieve globally optimal performance by overlooking the interdependence between understanding and generation. Additionally, RL methods face challenges with sparse and delayed rewards, which complicates training and optimization. To address these issues, we extend RL into both understanding and generation tasks by introducing step-by-step rewards throughout the token generation. The understanding reward increases as more slots are correctly filled in DST, while the generation reward grows with the accurate inclusion of user requests. Our approach…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · AI in Service Interactions · Team Dynamics and Performance

MethodsDynamic Sparse Training · Focus