Reward-Balancing for Statistical Spoken Dialogue Systems using Multi-objective Reinforcement Learning
Stefan Ultes, Pawe{\l} Budzianowski, I\~nigo Casanueva, Nikola, Mrk\v{s}i\'c, Lina Rojas-Barahona, Pei-Hao Su, Tsung-Hsien Wen, Milica, Ga\v{s}i\'c, Steve Young

TL;DR
This paper introduces a multi-objective reinforcement learning approach to optimize reward component weights in spoken dialogue systems, effectively balancing success and length to improve dialogue policy performance across multiple domains.
Contribution
It presents a structured method for reward component weighting using multi-objective RL, reducing training data needs and optimizing dialogue policies.
Findings
Optimized reward weights improve dialogue success rates.
Multi-objective RL reduces training dialogues needed.
Method outperforms default baseline in six domains.
Abstract
Reinforcement learning is widely used for dialogue policy optimization where the reward function often consists of more than one component, e.g., the dialogue success and the dialogue length. In this work, we propose a structured method for finding a good balance between these components by searching for the optimal reward component weighting. To render this search feasible, we use multi-objective reinforcement learning to significantly reduce the number of training dialogues required. We apply our proposed method to find optimized component weights for six domains and compare them to a default baseline.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
