Improving Multi-Domain Task-Oriented Dialogue System with Offline   Reinforcement Learning

Dharmendra Prajapat; Durga Toshniwal

arXiv:2411.05340·cs.CL·November 11, 2024

Improving Multi-Domain Task-Oriented Dialogue System with Offline Reinforcement Learning

Dharmendra Prajapat, Durga Toshniwal

PDF

Open Access

TL;DR

This paper introduces an improved multi-domain task-oriented dialogue system that combines supervised learning and reinforcement learning with a reward based on success rate and BLEU scores, enhancing task completion and response quality.

Contribution

The paper presents a novel approach that fine-tunes a pre-trained GPT2 model with reinforcement learning to address exposure bias and token loss in dialogue systems.

Findings

01

Increases inform rate by 1.60% on MultiWOZ2.1

02

Improves success rate by 3.17% on MultiWOZ2.1

03

Effectively balances task success and response fluency

Abstract

Task-oriented dialogue (TOD) system is designed to accomplish user-defined tasks through dialogues. The TOD system has progressed towards end-to-end modeling by leveraging pre-trained large language models. Fine-tuning the pre-trained language models using only supervised learning leads to the exposure bias and token loss problem and it deviates the models from completing the user's task. To address these issues, we propose a TOD system that leverages a unified pre-trained language model, GPT2, as a base model. It is optimized using supervised learning and reinforcement learning (RL). The issues in the TOD system are mitigated using a non-differentiable reward function. The reward is calculated using the weighted sum of the success rate and BLEU evaluation metrics. The success rate and BLEU metrics in reward calculation guide the language model for user task completion while ensuring a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · AI in Service Interactions · Robotics and Automated Systems

MethodsBalanced Selection