An Actor-Critic Algorithm for Sequence Prediction
Dzmitry Bahdanau, Philemon Brakel, Kelvin Xu, Anirudh Goyal, Ryan, Lowe, Joelle Pineau, Aaron Courville, Yoshua Bengio

TL;DR
This paper introduces an actor-critic reinforcement learning approach for sequence prediction tasks, improving training-test consistency and optimizing task-specific metrics like BLEU in natural language generation.
Contribution
It presents a novel supervised learning method using a critic network to better align training with testing conditions in sequence generation models.
Findings
Improved BLEU scores in machine translation.
Enhanced performance on synthetic sequence tasks.
Closer training-test alignment in sequence prediction.
Abstract
We present an approach to training neural networks to generate sequences using actor-critic methods from reinforcement learning (RL). Current log-likelihood training methods are limited by the discrepancy between their training and testing modes, as models must generate tokens conditioned on their previous guesses rather than the ground-truth tokens. We address this problem by introducing a \textit{critic} network that is trained to predict the value of an output token, given the policy of an \textit{actor} network. This results in a training procedure that is much closer to the test phase, and allows us to directly optimize for a task-specific score such as BLEU. Crucially, since we leverage these techniques in the supervised learning setting rather than the traditional RL setting, we condition the critic network on the ground-truth output. We show that our method leads to improved…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques
