Policy Networks with Two-Stage Training for Dialogue Systems
Mehdi Fatemi, Layla El Asri, Hannes Schulz, Jing He, Kaheer Suleman

TL;DR
This paper introduces a deep policy network trained with an advantage actor-critic method for dialogue systems, demonstrating efficient learning from limited data and outperforming traditional methods in a restaurant domain.
Contribution
It presents a novel deep RL approach that trains directly on original state and action spaces, reducing pre-engineering effort and improving data efficiency for dialogue systems.
Findings
Deep RL outperforms Gaussian Processes on summary spaces.
Efficient training with only a few hundred dialogues.
Faster convergence to optimal policies compared to other deep RL methods.
Abstract
In this paper, we propose to use deep policy networks which are trained with an advantage actor-critic method for statistically optimised dialogue systems. First, we show that, on summary state and action spaces, deep Reinforcement Learning (RL) outperforms Gaussian Processes methods. Summary state and action spaces lead to good performance but require pre-engineering effort, RL knowledge, and domain expertise. In order to remove the need to define such summary spaces, we show that deep RL can also be trained efficiently on the original state and action spaces. Dialogue systems based on partially observable Markov decision processes are known to require many dialogues to train, which makes them unappealing for practical deployment. We show that a deep RL method based on an actor-critic architecture can exploit a small amount of data very efficiently. Indeed, with only a few hundred…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
