Energy-Based Reranking: Improving Neural Machine Translation Using   Energy-Based Models

Sumanta Bhattacharyya; Amirmohammad Rooshenas; Subhajit Naskar; Simeng; Sun; Mohit Iyyer; Andrew McCallum

arXiv:2009.13267·cs.CL·September 22, 2021

Energy-Based Reranking: Improving Neural Machine Translation Using Energy-Based Models

Sumanta Bhattacharyya, Amirmohammad Rooshenas, Subhajit Naskar, Simeng, Sun, Mohit Iyyer, Andrew McCallum

PDF

1 Repo

TL;DR

This paper introduces an energy-based re-ranking method that leverages energy models trained to favor higher BLEU score samples, significantly improving neural machine translation performance across multiple datasets.

Contribution

It proposes a novel energy-based re-ranking approach that enhances NMT by training energy models to prioritize higher quality translation samples, bridging the gap between training objectives and evaluation metrics.

Findings

01

Improves BLEU scores by up to 4 points on IWSLT'14 German-English

02

Achieves +3.0 BLEU on Sinhala-English translation

03

Enhances WMT'16 English-German translation performance

Abstract

The discrepancy between maximum likelihood estimation (MLE) and task measures such as BLEU score has been studied before for autoregressive neural machine translation (NMT) and resulted in alternative training algorithms (Ranzato et al., 2016; Norouzi et al., 2016; Shen et al., 2016; Wu et al., 2018). However, MLE training remains the de facto approach for autoregressive NMT because of its computational efficiency and stability. Despite this mismatch between the training objective and task measure, we notice that the samples drawn from an MLE-based trained NMT support the desired distribution -- there are samples with much higher BLEU score comparing to the beam decoding output. To benefit from this observation, we train an energy-based model to mimic the behavior of the task measure (i.e., the energy-based model assigns lower energy to samples with higher BLEU score), which is resulted…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rooshenas/ebr_mt
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.