TL;DR
This paper explores how deep reinforcement learning can enhance sequence-to-sequence models, addressing common issues like exposure bias and train/test discrepancies, with a focus on complex tasks such as text summarization.
Contribution
It introduces a formulation combining RL with seq2seq models, discusses recent frameworks, and provides source code to improve long-term memory and decision-making in these models.
Findings
RL methods help address exposure bias in seq2seq models
Integration of RL improves long-term memory in sequence tasks
Source code supports implementation of RL-enhanced seq2seq models
Abstract
In recent times, sequence-to-sequence (seq2seq) models have gained a lot of popularity and provide state-of-the-art performance in a wide variety of tasks such as machine translation, headline generation, text summarization, speech to text conversion, and image caption generation. The underlying framework for all these models is usually a deep neural network comprising an encoder and a decoder. Although simple encoder-decoder models produce competitive results, many researchers have proposed additional improvements over these sequence-to-sequence models, e.g., using an attention-based model over the input, pointer-generation models, and self-attention models. However, such seq2seq models suffer from two common problems: 1) exposure bias and 2) inconsistency between train/test measurement. Recently, a completely novel point of view has emerged in addressing these two problems in seq2seq…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory · Sequence to Sequence
