TL;DR
This paper empirically examines Minimum Bayes Risk decoding in neural machine translation, revealing its biases and robustness improvements over traditional beam search, especially against copy noise and domain shifts.
Contribution
It provides a detailed analysis of MBR decoding's properties, highlighting its biases and robustness benefits in NMT.
Findings
MBR still exhibits length and token frequency biases due to utility metrics.
MBR increases robustness against copy noise in training data.
MBR improves robustness to domain shift.
Abstract
Neural Machine Translation (NMT) currently exhibits biases such as producing translations that are too short and overgenerating frequent words, and shows poor robustness to copy noise in training data or domain shift. Recent work has tied these shortcomings to beam search -- the de facto standard inference algorithm in NMT -- and Eikema & Aziz (2020) propose to use Minimum Bayes Risk (MBR) decoding on unbiased samples instead. In this paper, we empirically investigate the properties of MBR decoding on a number of previously reported biases and failure cases of beam search. We find that MBR still exhibits a length and token frequency bias, owing to the MT metrics used as utility functions, but that MBR also increases robustness against copy noise in the training data and domain shift.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
