Building Persona Consistent Dialogue Agents with Offline Reinforcement   Learning

Ryan Shea; Zhou Yu

arXiv:2310.10735·cs.CL·October 18, 2023·1 cites

Building Persona Consistent Dialogue Agents with Offline Reinforcement Learning

Ryan Shea, Zhou Yu

PDF

Open Access 1 Repo

TL;DR

This paper introduces an offline reinforcement learning framework with a variance-reducing importance sampling method to enhance persona consistency and dialogue quality in open domain chatbots, reducing training costs.

Contribution

It presents a novel offline RL approach combining supervised data training with targeted reward signals and introduces VaRMI importance sampling to improve training stability.

Findings

01

Improved persona consistency in dialogue agents.

02

Enhanced dialogue quality according to automatic and human evaluations.

03

Reduced training costs compared to online RL methods.

Abstract

Maintaining a consistent persona is a key quality for any open domain dialogue system. Current state-of-the-art systems do this by training agents with supervised learning or online reinforcement learning (RL). However, systems trained with supervised learning often lack consistency as they are never punished for uttering contradictions. Additional training with RL can alleviate some of these issues, however the training process is expensive. Instead, we propose an offline RL framework to improve the persona consistency of dialogue systems. Our framework allows us to combine the advantages of previous methods as we can inexpensively train our model on existing data as in supervised learning, while punishing and rewarding specific utterances as in RL. We also introduce a simple importance sampling method to reduce the variance of importance weights in offline RL training which we call…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ryanshea10/personachat_offline_rl
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI in Service Interactions · Topic Modeling · Persona Design and Applications