A Benchmarking Environment for Reinforcement Learning Based Task   Oriented Dialogue Management

I\~nigo Casanueva; Pawe{\l} Budzianowski; Pei-Hao Su; Nikola; Mrk\v{s}i\'c; Tsung-Hsien Wen; Stefan Ultes; Lina Rojas-Barahona; Steve; Young; Milica Ga\v{s}i\'c

arXiv:1711.11023·stat.ML·April 9, 2018·36 cites

A Benchmarking Environment for Reinforcement Learning Based Task Oriented Dialogue Management

I\~nigo Casanueva, Pawe{\l} Budzianowski, Pei-Hao Su, Nikola, Mrk\v{s}i\'c, Tsung-Hsien Wen, Stefan Ultes, Lina Rojas-Barahona, Steve, Young, Milica Ga\v{s}i\'c

PDF

Open Access

TL;DR

This paper introduces a benchmarking environment for reinforcement learning-based dialogue management, enabling fair comparison and evaluation of different models in simulated settings.

Contribution

It proposes a set of challenging simulated environments and provides baseline RL algorithms, facilitating reproducibility and standardized evaluation in dialogue management research.

Findings

01

Deep RL algorithms like DQN, A2C, and Natural Actor-Critic were evaluated.

02

A non-parametric model, GP-SARSA, was also tested.

03

The environments and models are publicly available for further research.

Abstract

Dialogue assistants are rapidly becoming an indispensable daily aid. To avoid the significant effort needed to hand-craft the required dialogue flow, the Dialogue Management (DM) module can be cast as a continuous Markov Decision Process (MDP) and trained through Reinforcement Learning (RL). Several RL models have been investigated over recent years. However, the lack of a common benchmarking framework makes it difficult to perform a fair comparison between different models and their capability to generalise to different environments. Therefore, this paper proposes a set of challenging simulated environments for dialogue model development and evaluation. To provide some baselines, we investigate a number of representative parametric algorithms, namely deep reinforcement learning algorithms - DQN, A2C and Natural Actor-Critic and compare them to a non-parametric model, GP-SARSA. Both the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Topic Modeling · AI in Service Interactions

MethodsConvolution · Dense Connections · Q-Learning · Deep Q-Network · A2C