Exploring Supervised and Unsupervised Rewards in Machine Translation

Julia Ive; Zixu Wang; Marina Fomicheva; Lucia Specia

arXiv:2102.11403·cs.CL·February 24, 2021

Exploring Supervised and Unsupervised Rewards in Machine Translation

Julia Ive, Zixu Wang, Marina Fomicheva, Lucia Specia

PDF

1 Repo

TL;DR

This paper introduces entropy-regularised and dynamic unsupervised reward methods within reinforcement learning for neural machine translation, aiming to improve generalization and translation of ambiguous words by reducing reliance on traditional metrics like BLEU.

Contribution

It proposes novel RL techniques based on the Soft Actor-Critic framework that enhance exploration and reduce overfitting in neural machine translation models.

Findings

01

SAC with BLEU reward reduces overfitting and improves out-of-domain performance.

02

Dynamic unsupervised reward enhances translation of ambiguous words.

03

Proposed methods outperform standard RL approaches in MT tasks.

Abstract

Reinforcement Learning (RL) is a powerful framework to address the discrepancy between loss functions used during training and the final evaluation metrics to be used at test time. When applied to neural Machine Translation (MT), it minimises the mismatch between the cross-entropy loss and non-differentiable evaluation metrics like BLEU. However, the suitability of these metrics as reward function at training time is questionable: they tend to be sparse and biased towards the specific words used in the reference texts. We propose to address this problem by making models less reliant on such metrics in two ways: (a) with an entropy-regularised RL method that does not only maximise a reward function but also explore the action space to avoid peaky distributions; (b) with a novel RL method that explores a dynamic unsupervised reward function to balance between exploration and exploitation.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ImperialNLP/pysimt
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Methods1x1 Convolution · Dilated Convolution · Convolution · Global Average Pooling · Average Pooling · Switchable Atrous Convolution