QE-EBM: Using Quality Estimators as Energy Loss for Machine Translation
Gahyun Yoo, Jay Yoon Lee

TL;DR
QE-EBM introduces a novel approach that uses quality estimators as trainable loss functions, enabling direct gradient backpropagation in neural machine translation, leading to significant improvements especially in low-resource language pairs.
Contribution
The paper presents QE-EBM, a new method that employs quality estimators as energy-based loss functions for end-to-end training of machine translation models, overcoming limitations of reinforcement learning.
Findings
QE-EBM outperforms REINFORCE and PPO baselines across multiple languages.
Significant BLEU and COMET score improvements for low-resource language translation.
Effective in both low and high resource translation scenarios.
Abstract
Reinforcement learning has shown great promise in aligning language models with human preferences in a variety of text generation tasks, including machine translation. For translation tasks, rewards can easily be obtained from quality estimation (QE) models which can generate rewards for unlabeled data. Despite its usefulness, reinforcement learning cannot exploit the gradients with respect to the QE score. We propose QE-EBM, a method of employing quality estimators as trainable loss networks that can directly backpropagate to the NMT model. We examine our method on several low and high resource target languages with English as the source language. QE-EBM outperforms strong baselines such as REINFORCE and proximal policy optimization (PPO) as well as supervised fine-tuning for all target languages, especially low-resource target languages. Most notably, for English-to-Mongolian…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsREINFORCE
