Improving Multi-Domain Task-Oriented Dialogue System with Offline Reinforcement Learning
Dharmendra Prajapat, Durga Toshniwal

TL;DR
This paper introduces an improved multi-domain task-oriented dialogue system that combines supervised learning and reinforcement learning with a reward based on success rate and BLEU scores, enhancing task completion and response quality.
Contribution
The paper presents a novel approach that fine-tunes a pre-trained GPT2 model with reinforcement learning to address exposure bias and token loss in dialogue systems.
Findings
Increases inform rate by 1.60% on MultiWOZ2.1
Improves success rate by 3.17% on MultiWOZ2.1
Effectively balances task success and response fluency
Abstract
Task-oriented dialogue (TOD) system is designed to accomplish user-defined tasks through dialogues. The TOD system has progressed towards end-to-end modeling by leveraging pre-trained large language models. Fine-tuning the pre-trained language models using only supervised learning leads to the exposure bias and token loss problem and it deviates the models from completing the user's task. To address these issues, we propose a TOD system that leverages a unified pre-trained language model, GPT2, as a base model. It is optimized using supervised learning and reinforcement learning (RL). The issues in the TOD system are mitigated using a non-differentiable reward function. The reward is calculated using the weighted sum of the success rate and BLEU evaluation metrics. The success rate and BLEU metrics in reward calculation guide the language model for user task completion while ensuring a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · AI in Service Interactions · Robotics and Automated Systems
MethodsBalanced Selection
