Switch-based Active Deep Dyna-Q: Efficient Adaptive Planning for Task-Completion Dialogue Policy Learning
Yuexin Wu, Xiujun Li, Jingjing Liu, Jianfeng Gao, Yiming, Yang

TL;DR
This paper introduces Switch-DDQ, an adaptive reinforcement learning framework for dialogue agents that intelligently balances real and simulated experiences, improving training efficiency and performance.
Contribution
It proposes a switcher mechanism and active learning to optimize the use of real and simulated experiences in Deep Dyna-Q for dialogue policy learning.
Findings
Switch-DDQ outperforms DDQ and Q-learning baselines in simulations.
The switcher improves the ratio of real to simulated experiences effectively.
Active learning enhances sample efficiency in training.
Abstract
Training task-completion dialogue agents with reinforcement learning usually requires a large number of real user experiences. The Dyna-Q algorithm extends Q-learning by integrating a world model, and thus can effectively boost training efficiency using simulated experiences generated by the world model. The effectiveness of Dyna-Q, however, depends on the quality of the world model - or implicitly, the pre-specified ratio of real vs. simulated experiences used for Q-learning. To this end, we extend the recently proposed Deep Dyna-Q (DDQ) framework by integrating a switcher that automatically determines whether to use a real or simulated experience for Q-learning. Furthermore, we explore the use of active learning for improving sample efficiency, by encouraging the world model to generate simulated experiences in the state-action space where the agent has not (fully) explored. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems · Reinforcement Learning in Robotics
