Feudal Reinforcement Learning for Dialogue Management in Large Domains

I\~nigo Casanueva; Pawe{\l} Budzianowski; Pei-Hao Su; Stefan Ultes,; Lina Rojas-Barahona; Bo-Hsiang Tseng; Milica Ga\v{s}i\'c

arXiv:1803.03232·cs.CL·March 9, 2018

Feudal Reinforcement Learning for Dialogue Management in Large Domains

I\~nigo Casanueva, Pawe{\l} Budzianowski, Pei-Hao Su, Stefan Ultes,, Lina Rojas-Barahona, Bo-Hsiang Tseng, Milica Ga\v{s}i\'c

PDF

TL;DR

This paper introduces a Feudal Reinforcement Learning architecture for dialogue management that improves scalability in large domains by decomposing decision-making and utilizing domain ontology for abstraction, leading to better performance.

Contribution

The paper presents a novel Feudal RL-based dialogue management system that leverages domain ontology and hierarchical decision-making to handle large-scale dialogue domains effectively.

Findings

01

Significantly outperforms previous state-of-the-art methods.

02

Does not require additional reward signals.

03

Enhances scalability in large dialogue domains.

Abstract

Reinforcement learning (RL) is a promising approach to solve dialogue policy optimisation. Traditional RL algorithms, however, fail to scale to large domains due to the curse of dimensionality. We propose a novel Dialogue Management architecture, based on Feudal RL, which decomposes the decision into two steps; a first step where a master policy selects a subset of primitive actions, and a second step where a primitive action is chosen from the selected subset. The structural information included in the domain ontology is used to abstract the dialogue state space, taking the decisions at each step using different parts of the abstracted state. This, combined with an information sharing mechanism between slots, increases the scalability to large domains. We show that an implementation of this approach, based on Deep-Q Networks, significantly outperforms previous state of the art in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.