Rewarding What Matters: Step-by-Step Reinforcement Learning for Task-Oriented Dialogue
Huifang Du, Shuqin Li, Minghao Wu, Xuejing Feng, Yuan-Fang Li, Haofen, Wang

TL;DR
This paper introduces a step-by-step reinforcement learning method that jointly optimizes understanding and generation in task-oriented dialogue systems, leading to improved performance and state-of-the-art results.
Contribution
It extends reinforcement learning to both dialogue understanding and generation tasks with token-level rewards, addressing sparse reward issues and improving overall system performance.
Findings
Achieves state-of-the-art results on MultiWOZ2.0, MultiWOZ2.1, and In-Car datasets.
Demonstrates superior few-shot learning capabilities in low-resource scenarios.
Effectively balances understanding and generation through step-by-step rewards.
Abstract
Reinforcement learning (RL) is a powerful approach to enhance task-oriented dialogue (TOD) systems. However, existing RL methods tend to mainly focus on generation tasks, such as dialogue policy learning (DPL) or response generation (RG), while neglecting dialogue state tracking (DST) for understanding. This narrow focus limits the systems to achieve globally optimal performance by overlooking the interdependence between understanding and generation. Additionally, RL methods face challenges with sparse and delayed rewards, which complicates training and optimization. To address these issues, we extend RL into both understanding and generation tasks by introducing step-by-step rewards throughout the token generation. The understanding reward increases as more slots are correctly filled in DST, while the generation reward grows with the accurate inclusion of user requests. Our approach…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · AI in Service Interactions · Team Dynamics and Performance
MethodsDynamic Sparse Training · Focus
