Improving a sequence-to-sequence nlp model using a reinforcement   learning policy algorithm

Jabri Ismail; Aboulbichr Ahmed; El ouaazizi Aziza

arXiv:2212.14117·cs.CL·January 19, 2023

Improving a sequence-to-sequence nlp model using a reinforcement learning policy algorithm

Jabri Ismail, Aboulbichr Ahmed, El ouaazizi Aziza

PDF

TL;DR

This paper introduces a deep reinforcement learning approach to improve sequence-to-sequence NLP dialogue models by optimizing for long-term conversational quality, coherence, and engagement.

Contribution

It combines reinforcement learning with dialogue generation to enhance the coherence and interactivity of chatbot responses, a novel integration for long-term dialogue modeling.

Findings

01

Generated responses are more interactive and engaging.

02

Model encourages sustained successful conversations.

03

Improves diversity, length, and complexity of responses.

Abstract

Nowadays, the current neural network models of dialogue generation(chatbots) show great promise for generating answers for chatty agents. But they are short-sighted in that they predict utterances one at a time while disregarding their impact on future outcomes. Modelling a dialogue's future direction is critical for generating coherent, interesting dialogues, a need that has led traditional NLP dialogue models that rely on reinforcement learning. In this article, we explain how to combine these objectives by using deep reinforcement learning to predict future rewards in chatbot dialogue. The model simulates conversations between two virtual agents, with policy gradient methods used to reward sequences that exhibit three useful conversational characteristics: the flow of informality, coherence, and simplicity of response (related to forward-looking function). We assess our model based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.