TL;DR
This paper introduces an energy-based re-ranking method that leverages energy models trained to favor higher BLEU score samples, significantly improving neural machine translation performance across multiple datasets.
Contribution
It proposes a novel energy-based re-ranking approach that enhances NMT by training energy models to prioritize higher quality translation samples, bridging the gap between training objectives and evaluation metrics.
Findings
Improves BLEU scores by up to 4 points on IWSLT'14 German-English
Achieves +3.0 BLEU on Sinhala-English translation
Enhances WMT'16 English-German translation performance
Abstract
The discrepancy between maximum likelihood estimation (MLE) and task measures such as BLEU score has been studied before for autoregressive neural machine translation (NMT) and resulted in alternative training algorithms (Ranzato et al., 2016; Norouzi et al., 2016; Shen et al., 2016; Wu et al., 2018). However, MLE training remains the de facto approach for autoregressive NMT because of its computational efficiency and stability. Despite this mismatch between the training objective and task measure, we notice that the samples drawn from an MLE-based trained NMT support the desired distribution -- there are samples with much higher BLEU score comparing to the beam decoding output. To benefit from this observation, we train an energy-based model to mimic the behavior of the task measure (i.e., the energy-based model assigns lower energy to samples with higher BLEU score), which is resulted…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
