Deep Reinforcement Learning For Sequence to Sequence Models

Yaser Keneshloo; Tian Shi; Naren Ramakrishnan; Chandan K. Reddy

arXiv:1805.09461·cs.LG·April 17, 2019

Deep Reinforcement Learning For Sequence to Sequence Models

Yaser Keneshloo, Tian Shi, Naren Ramakrishnan, Chandan K. Reddy

PDF

3 Repos

TL;DR

This paper explores how deep reinforcement learning can enhance sequence-to-sequence models, addressing common issues like exposure bias and train/test discrepancies, with a focus on complex tasks such as text summarization.

Contribution

It introduces a formulation combining RL with seq2seq models, discusses recent frameworks, and provides source code to improve long-term memory and decision-making in these models.

Findings

01

RL methods help address exposure bias in seq2seq models

02

Integration of RL improves long-term memory in sequence tasks

03

Source code supports implementation of RL-enhanced seq2seq models

Abstract

In recent times, sequence-to-sequence (seq2seq) models have gained a lot of popularity and provide state-of-the-art performance in a wide variety of tasks such as machine translation, headline generation, text summarization, speech to text conversion, and image caption generation. The underlying framework for all these models is usually a deep neural network comprising an encoder and a decoder. Although simple encoder-decoder models produce competitive results, many researchers have proposed additional improvements over these sequence-to-sequence models, e.g., using an attention-based model over the input, pointer-generation models, and self-attention models. However, such seq2seq models suffer from two common problems: 1) exposure bias and 2) inconsistency between train/test measurement. Recently, a completely novel point of view has emerged in addressing these two problems in seq2seq…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory · Sequence to Sequence