Sample Efficient Deep Reinforcement Learning for Dialogue Systems with   Large Action Spaces

Gell\'ert Weisz; Pawe{\l} Budzianowski; Pei-Hao Su; Milica Ga\v{s}i\'c

arXiv:1802.03753·cs.CL·February 13, 2018

Sample Efficient Deep Reinforcement Learning for Dialogue Systems with Large Action Spaces

Gell\'ert Weisz, Pawe{\l} Budzianowski, Pei-Hao Su, Milica Ga\v{s}i\'c

PDF

TL;DR

This paper applies an improved deep reinforcement learning algorithm, ACER, to dialogue systems with large action spaces, demonstrating faster training and better performance than existing methods.

Contribution

It extends the ACER algorithm to large action spaces in dialogue systems, improving sample efficiency and training speed in complex environments.

Findings

01

ACER outperforms current state-of-the-art in dialogue policy optimization.

02

The method trains significantly faster in large action spaces.

03

Application to complex environments is feasible with improved efficiency.

Abstract

In spoken dialogue systems, we aim to deploy artificial intelligence to build automated dialogue agents that can converse with humans. A part of this effort is the policy optimisation task, which attempts to find a policy describing how to respond to humans, in the form of a function taking the current state of the dialogue and returning the response of the system. In this paper, we investigate deep reinforcement learning approaches to solve this problem. Particular attention is given to actor-critic methods, off-policy reinforcement learning with experience replay, and various methods aimed at reducing the bias and variance of estimators. When combined, these methods result in the previously proposed ACER algorithm that gave competitive results in gaming environments. These environments however are fully observable and have a relatively small action set so in this paper we examine the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsExperience Replay · Retrace · Trust Region Policy Optimization · Entropy Regularization · Stochastic Dueling Network · Dense Connections · *Communicated@Fast*How Do I Communicate to Expedia? · Softmax · Convolution · ACER