Budgeted Policy Learning for Task-Oriented Dialogue Systems
Zhirui Zhang, Xiujun Li, Jianfeng Gao, Enhong Chen

TL;DR
This paper introduces a budget-aware learning method for task-oriented dialogue systems that optimally allocates limited user interactions to improve success rates.
Contribution
It extends Deep Dyna-Q with a Budget-Conscious Scheduling framework, including a global scheduler, experience controller, and user goal sampling, for efficient learning under fixed interaction budgets.
Findings
Significant success rate improvements over baselines.
Effective utilization of limited user interactions.
Robust performance on movie-ticket booking task.
Abstract
This paper presents a new approach that extends Deep Dyna-Q (DDQ) by incorporating a Budget-Conscious Scheduling (BCS) to best utilize a fixed, small amount of user interactions (budget) for learning task-oriented dialogue agents. BCS consists of (1) a Poisson-based global scheduler to allocate budget over different stages of training; (2) a controller to decide at each training step whether the agent is trained using real or simulated experiences; (3) a user goal sampling module to generate the experiences that are most effective for policy learning. Experiments on a movie-ticket booking task with simulated and real users show that our approach leads to significant improvements in success rate over the state-of-the-art baselines given the fixed budget.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Topic Modeling · Intelligent Tutoring Systems and Adaptive Learning
