TL;DR
This paper presents a reinforcement learning approach that enhances neural machine translation by utilizing simulated human feedback, effectively optimizing translation quality even with noisy, delayed, and granular user ratings.
Contribution
The authors introduce a novel RL algorithm combining advantage actor-critic with attention-based NMT, tailored for large action spaces and delayed rewards, and robust to feedback variability.
Findings
Improves translation quality using simulated human feedback.
Effectively optimizes traditional translation metrics.
Robust to feedback noise and high variance.
Abstract
Machine translation is a natural candidate problem for reinforcement learning from human feedback: users provide quick, dirty ratings on candidate translations to guide a system to improve. Yet, current neural machine translation training focuses on expensive human-generated reference translations. We describe a reinforcement learning algorithm that improves neural machine translation systems from simulated human feedback. Our algorithm combines the advantage actor-critic algorithm (Mnih et al., 2016) with the attention-based neural encoder-decoder architecture (Luong et al., 2015). This algorithm (a) is well-designed for problems with a large action space and delayed rewards, (b) effectively optimizes traditional corpus-level machine translation metrics, and (c) is robust to skewed, high-variance, granular feedback modeled after actual human behaviors.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
