End-to-end LSTM-based dialog control optimized with supervised and reinforcement learning
Jason D. Williams, Geoffrey Zweig

TL;DR
This paper introduces an end-to-end LSTM-based dialog system that learns from supervised data and reinforcement learning, reducing manual feature engineering and improving task-oriented dialog management.
Contribution
It presents a novel LSTM-based dialog control model that integrates supervised and reinforcement learning for efficient end-to-end training.
Findings
Supervised learning provides a good initial policy with few dialogs.
Reinforcement learning improves dialog policies through interaction.
Combining SL and RL accelerates learning and enhances performance.
Abstract
This paper presents a model for end-to-end learning of task-oriented dialog systems. The main component of the model is a recurrent neural network (an LSTM), which maps from raw dialog history directly to a distribution over system actions. The LSTM automatically infers a representation of dialog history, which relieves the system developer of much of the manual feature engineering of dialog state. In addition, the developer can provide software that expresses business rules and provides access to programmatic APIs, enabling the LSTM to take actions in the real world on behalf of the user. The LSTM can be optimized using supervised learning (SL), where a domain expert provides example dialogs which the LSTM should imitate; or using reinforcement learning (RL), where the system improves by interacting directly with end users. Experiments show that SL and RL are complementary: SL alone…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Multi-Agent Systems and Negotiation · Topic Modeling
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
