MAD for Robust Reinforcement Learning in Machine Translation

Domenic Donato; Lei Yu; Wang Ling; Chris Dyer

arXiv:2207.08583·cs.CL·July 19, 2022·5 cites

MAD for Robust Reinforcement Learning in Machine Translation

Domenic Donato, Lei Yu, Wang Ling, Chris Dyer

PDF

Open Access

TL;DR

This paper presents MAD, a distributed policy gradient algorithm that improves training stability and generalization in machine translation by using mean absolute deviation and variance reduction strategies.

Contribution

The paper introduces MAD, a novel distributed policy gradient method with variance reduction techniques, outperforming existing reward-aware training methods in machine translation.

Findings

01

MAD outperforms REINFORCE, MRT, and PPO in stability and generalization.

02

Policies trained with MAD perform well with greedy and beam search decoding.

03

The learned policies are sensitive to the reward functions used during training.

Abstract

We introduce a new distributed policy gradient algorithm and show that it outperforms existing reward-aware training procedures such as REINFORCE, minimum risk training (MRT) and proximal policy optimization (PPO) in terms of training stability and generalization performance when optimizing machine translation models. Our algorithm, which we call MAD (on account of using the mean absolute deviation in the importance weighting calculation), has distributed data generators sampling multiple candidates per source sentence on worker nodes, while a central learner updates the policy. MAD depends crucially on two variance reduction strategies: (1) a conditional reward normalization method that ensures each source sentence has both positive and negative reward translation examples and (2) a new robust importance weighting scheme that acts as a conditional entropy regularizer. Experiments on a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Domain Adaptation and Few-Shot Learning

MethodsREINFORCE