Toward Self-learning End-to-End Task-Oriented Dialog Systems
Xiaoying Zhang, Baolin Peng, Jianfeng Gao, Helen Meng

TL;DR
This paper introduces SL-AGENT, a self-learning framework that enables end-to-end task-oriented dialog systems to adapt automatically to changing environments by learning from unlabeled human-bot interactions using reinforcement learning.
Contribution
It proposes a novel self-learning approach with a reward model for adaptive end-to-end dialog systems, reducing reliance on labeled data.
Findings
SL-AGENT effectively adapts to changing environments in four dialog tasks.
The framework improves dialog response quality through reinforcement learning.
Both automatic and human evaluations confirm the method's effectiveness.
Abstract
End-to-end task bots are typically learned over a static and usually limited-size corpus. However, when deployed in dynamic, changing, and open environments to interact with users, task bots tend to fail when confronted with data that deviate from the training corpus, i.e., out-of-distribution samples. In this paper, we study the problem of automatically adapting task bots to changing environments by learning from human-bot interactions with minimum or zero human annotations. We propose SL-AGENT, a novel self-learning framework for building end-to-end task bots. SL-AGENT consists of a dialog model and a pre-trained reward model to predict the quality of an agent response. It enables task bots to automatically adapt to changing environments by learning from the unlabeled human-bot dialog logs accumulated after deployment via reinforcement learning with the incorporated reward model.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Data Stream Mining Techniques · Speech and dialogue systems
MethodsSelf-Learning
