An Actor-Critic Algorithm for Sequence Prediction

Dzmitry Bahdanau; Philemon Brakel; Kelvin Xu; Anirudh Goyal; Ryan; Lowe; Joelle Pineau; Aaron Courville; Yoshua Bengio

arXiv:1607.07086·cs.LG·March 6, 2017·224 cites

An Actor-Critic Algorithm for Sequence Prediction

Dzmitry Bahdanau, Philemon Brakel, Kelvin Xu, Anirudh Goyal, Ryan, Lowe, Joelle Pineau, Aaron Courville, Yoshua Bengio

PDF

Open Access 3 Repos

TL;DR

This paper introduces an actor-critic reinforcement learning approach for sequence prediction tasks, improving training-test consistency and optimizing task-specific metrics like BLEU in natural language generation.

Contribution

It presents a novel supervised learning method using a critic network to better align training with testing conditions in sequence generation models.

Findings

01

Improved BLEU scores in machine translation.

02

Enhanced performance on synthetic sequence tasks.

03

Closer training-test alignment in sequence prediction.

Abstract

We present an approach to training neural networks to generate sequences using actor-critic methods from reinforcement learning (RL). Current log-likelihood training methods are limited by the discrepancy between their training and testing modes, as models must generate tokens conditioned on their previous guesses rather than the ground-truth tokens. We address this problem by introducing a \textit{critic} network that is trained to predict the value of an output token, given the policy of an \textit{actor} network. This results in a training procedure that is much closer to the test phase, and allows us to directly optimize for a task-specific score such as BLEU. Crucially, since we leverage these techniques in the supervised learning setting rather than the traditional RL setting, we condition the critic network on the ground-truth output. We show that our method leads to improved…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques