A Benchmarking Environment for Reinforcement Learning Based Task Oriented Dialogue Management
I\~nigo Casanueva, Pawe{\l} Budzianowski, Pei-Hao Su, Nikola, Mrk\v{s}i\'c, Tsung-Hsien Wen, Stefan Ultes, Lina Rojas-Barahona, Steve, Young, Milica Ga\v{s}i\'c

TL;DR
This paper introduces a benchmarking environment for reinforcement learning-based dialogue management, enabling fair comparison and evaluation of different models in simulated settings.
Contribution
It proposes a set of challenging simulated environments and provides baseline RL algorithms, facilitating reproducibility and standardized evaluation in dialogue management research.
Findings
Deep RL algorithms like DQN, A2C, and Natural Actor-Critic were evaluated.
A non-parametric model, GP-SARSA, was also tested.
The environments and models are publicly available for further research.
Abstract
Dialogue assistants are rapidly becoming an indispensable daily aid. To avoid the significant effort needed to hand-craft the required dialogue flow, the Dialogue Management (DM) module can be cast as a continuous Markov Decision Process (MDP) and trained through Reinforcement Learning (RL). Several RL models have been investigated over recent years. However, the lack of a common benchmarking framework makes it difficult to perform a fair comparison between different models and their capability to generalise to different environments. Therefore, this paper proposes a set of challenging simulated environments for dialogue model development and evaluation. To provide some baselines, we investigate a number of representative parametric algorithms, namely deep reinforcement learning algorithms - DQN, A2C and Natural Actor-Critic and compare them to a non-parametric model, GP-SARSA. Both the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Topic Modeling · AI in Service Interactions
MethodsConvolution · Dense Connections · Q-Learning · Deep Q-Network · A2C
