What Does The User Want? Information Gain for Hierarchical Dialogue   Policy Optimisation

Christian Geishauser; Songbo Hu; Hsien-chin Lin; Nurul Lubis; Michael; Heck; Shutong Feng; Carel van Niekerk; Milica Ga\v{s}i\'c

arXiv:2109.07129·cs.LG·September 16, 2021

What Does The User Want? Information Gain for Hierarchical Dialogue Policy Optimisation

Christian Geishauser, Songbo Hu, Hsien-chin Lin, Nurul Lubis, Michael, Heck, Shutong Feng, Carel van Niekerk, Milica Ga\v{s}i\'c

PDF

Open Access

TL;DR

This paper introduces FeudalGain, an intrinsic reward based on information gain, to improve hierarchical dialogue policy optimization, resulting in more efficient, stable learning and state-of-the-art performance in task-oriented dialogue systems.

Contribution

It proposes an information gain-based intrinsic reward for hierarchical dialogue management, enhancing learning efficiency and stability in reinforcement learning-based dialogue policies.

Findings

01

FeudalGain outperforms existing methods in PyDial environments.

02

The approach improves sample efficiency and stability.

03

Human trials confirm effectiveness in real-world scenarios.

Abstract

The dialogue management component of a task-oriented dialogue system is typically optimised via reinforcement learning (RL). Optimisation via RL is highly susceptible to sample inefficiency and instability. The hierarchical approach called Feudal Dialogue Management takes a step towards more efficient learning by decomposing the action space. However, it still suffers from instability due to the reward only being provided at the end of the dialogue. We propose the usage of an intrinsic reward based on information gain to address this issue. Our proposed reward favours actions that resolve uncertainty or query the user whenever necessary. It enables the policy to learn how to retrieve the users' needs efficiently, which is an integral aspect in every task-oriented conversation. Our algorithm, which we call FeudalGain, achieves state-of-the-art results in most environments of the PyDial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Topic Modeling · Intelligent Tutoring Systems and Adaptive Learning