Switch-based Active Deep Dyna-Q: Efficient Adaptive Planning for   Task-Completion Dialogue Policy Learning

Yuexin Wu; Xiujun Li; Jingjing Liu; Jianfeng Gao; Yiming; Yang

arXiv:1811.07550·cs.CL·November 20, 2018·5 cites

Switch-based Active Deep Dyna-Q: Efficient Adaptive Planning for Task-Completion Dialogue Policy Learning

Yuexin Wu, Xiujun Li, Jingjing Liu, Jianfeng Gao, Yiming, Yang

PDF

Open Access 1 Repo

TL;DR

This paper introduces Switch-DDQ, an adaptive reinforcement learning framework for dialogue agents that intelligently balances real and simulated experiences, improving training efficiency and performance.

Contribution

It proposes a switcher mechanism and active learning to optimize the use of real and simulated experiences in Deep Dyna-Q for dialogue policy learning.

Findings

01

Switch-DDQ outperforms DDQ and Q-learning baselines in simulations.

02

The switcher improves the ratio of real to simulated experiences effectively.

03

Active learning enhances sample efficiency in training.

Abstract

Training task-completion dialogue agents with reinforcement learning usually requires a large number of real user experiences. The Dyna-Q algorithm extends Q-learning by integrating a world model, and thus can effectively boost training efficiency using simulated experiences generated by the world model. The effectiveness of Dyna-Q, however, depends on the quality of the world model - or implicitly, the pre-specified ratio of real vs. simulated experiences used for Q-learning. To this end, we extend the recently proposed Deep Dyna-Q (DDQ) framework by integrating a switcher that automatically determines whether to use a real or simulated experience for Q-learning. Furthermore, we explore the use of active learning for improving sample efficiency, by encouraging the world model to generate simulated experiences in the state-action space where the agent has not (fully) explored. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

CrickWu/Swtich-DDQ
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech and dialogue systems · Reinforcement Learning in Robotics