Reward-Balancing for Statistical Spoken Dialogue Systems using   Multi-objective Reinforcement Learning

Stefan Ultes; Pawe{\l} Budzianowski; I\~nigo Casanueva; Nikola; Mrk\v{s}i\'c; Lina Rojas-Barahona; Pei-Hao Su; Tsung-Hsien Wen; Milica; Ga\v{s}i\'c; Steve Young

arXiv:1707.06299·cs.CL·July 21, 2017

Reward-Balancing for Statistical Spoken Dialogue Systems using Multi-objective Reinforcement Learning

Stefan Ultes, Pawe{\l} Budzianowski, I\~nigo Casanueva, Nikola, Mrk\v{s}i\'c, Lina Rojas-Barahona, Pei-Hao Su, Tsung-Hsien Wen, Milica, Ga\v{s}i\'c, Steve Young

PDF

TL;DR

This paper introduces a multi-objective reinforcement learning approach to optimize reward component weights in spoken dialogue systems, effectively balancing success and length to improve dialogue policy performance across multiple domains.

Contribution

It presents a structured method for reward component weighting using multi-objective RL, reducing training data needs and optimizing dialogue policies.

Findings

01

Optimized reward weights improve dialogue success rates.

02

Multi-objective RL reduces training dialogues needed.

03

Method outperforms default baseline in six domains.

Abstract

Reinforcement learning is widely used for dialogue policy optimization where the reward function often consists of more than one component, e.g., the dialogue success and the dialogue length. In this work, we propose a structured method for finding a good balance between these components by searching for the optimal reward component weighting. To render this search feasible, we use multi-objective reinforcement learning to significantly reduce the number of training dialogues required. We apply our proposed method to find optimized component weights for six domains and compare them to a default baseline.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.