# Deep Reinforcement Learning For Modeling Chit-Chat Dialog With Discrete   Attributes

**Authors:** Chinnadhurai Sankar, Sujith Ravi

arXiv: 1907.02848 · 2019-09-17

## TL;DR

This paper introduces a reinforcement learning approach to improve open domain dialog systems by conditioning response generation on discrete attributes, resulting in more diverse and less repetitive responses.

## Contribution

It proposes a novel RL formulation that optimizes dialog attributes instead of tokens, enhancing diversity and reducing redundancy in generated responses.

## Key findings

- Improved model perplexity and response diversity.
- Enhanced human evaluation scores.
- More practical and sample-efficient policy optimization.

## Abstract

Open domain dialog systems face the challenge of being repetitive and producing generic responses. In this paper, we demonstrate that by conditioning the response generation on interpretable discrete dialog attributes and composed attributes, it helps improve the model perplexity and results in diverse and interesting non-redundant responses. We propose to formulate the dialog attribute prediction as a reinforcement learning (RL) problem and use policy gradients methods to optimize utterance generation using long-term rewards. Unlike existing RL approaches which formulate the token prediction as a policy, our method reduces the complexity of the policy optimization by limiting the action space to dialog attributes, thereby making the policy optimization more practical and sample efficient. We demonstrate this with experimental and human evaluations.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.02848/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/1907.02848/full.md

## References

42 references — full list in the complete paper: https://tomesphere.com/paper/1907.02848/full.md

---
Source: https://tomesphere.com/paper/1907.02848