Hierarchical Reinforcement Learning for Open-Domain Dialog

Abdelrhman Saleh; Natasha Jaques; Asma Ghandeharioun; Judy Hanwen; Shen; Rosalind Picard

arXiv:1909.07547·cs.LG·January 3, 2020

Hierarchical Reinforcement Learning for Open-Domain Dialog

Abdelrhman Saleh, Natasha Jaques, Asma Ghandeharioun, Judy Hanwen, Shen, Rosalind Picard

PDF

1 Repo

TL;DR

This paper introduces VHRL, a hierarchical reinforcement learning method that improves open-domain dialog generation by optimizing long-term conversational rewards, leading to more human-like and appropriate interactions.

Contribution

The paper presents a novel hierarchical RL framework that tunes utterance-level embeddings, enabling better long-term reward optimization in dialog models.

Findings

01

Significant improvements in human evaluation metrics.

02

Enhanced automatic metrics for dialog quality.

03

Outperforms state-of-the-art Transformer-based models.

Abstract

Open-domain dialog generation is a challenging problem; maximum likelihood training can lead to repetitive outputs, models have difficulty tracking long-term conversational goals, and training on standard movie or online datasets may lead to the generation of inappropriate, biased, or offensive text. Reinforcement Learning (RL) is a powerful framework that could potentially address these issues, for example by allowing a dialog model to optimize for reducing toxicity and repetitiveness. However, previous approaches which apply RL to open-domain dialog generation do so at the word level, making it difficult for the model to learn proper credit assignment for long-term conversational rewards. In this paper, we propose a novel approach to hierarchical reinforcement learning, VHRL, which uses policy gradients to tune the utterance-level embedding of a variational sequence model. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

natashamjaques/neural_chat
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.