Reinforcement Learning for Bandit Neural Machine Translation with   Simulated Human Feedback

Khanh Nguyen; Hal Daum\'e III; Jordan Boyd-Graber

arXiv:1707.07402·cs.CL·November 15, 2017

Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback

Khanh Nguyen, Hal Daum\'e III, Jordan Boyd-Graber

PDF

1 Repo

TL;DR

This paper presents a reinforcement learning approach that enhances neural machine translation by utilizing simulated human feedback, effectively optimizing translation quality even with noisy, delayed, and granular user ratings.

Contribution

The authors introduce a novel RL algorithm combining advantage actor-critic with attention-based NMT, tailored for large action spaces and delayed rewards, and robust to feedback variability.

Findings

01

Improves translation quality using simulated human feedback.

02

Effectively optimizes traditional translation metrics.

03

Robust to feedback noise and high variance.

Abstract

Machine translation is a natural candidate problem for reinforcement learning from human feedback: users provide quick, dirty ratings on candidate translations to guide a system to improve. Yet, current neural machine translation training focuses on expensive human-generated reference translations. We describe a reinforcement learning algorithm that improves neural machine translation systems from simulated human feedback. Our algorithm combines the advantage actor-critic algorithm (Mnih et al., 2016) with the attention-based neural encoder-decoder architecture (Luong et al., 2015). This algorithm (a) is well-designed for problems with a large action space and delayed rewards, (b) effectively optimizes traditional corpus-level machine translation metrics, and (c) is robust to skewed, high-variance, granular feedback modeled after actual human behaviors.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

khanhptnk/bandit-nmt
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.