Understanding the Properties of Minimum Bayes Risk Decoding in Neural   Machine Translation

Mathias M\"uller; Rico Sennrich

arXiv:2105.08504·cs.CL·May 19, 2021

Understanding the Properties of Minimum Bayes Risk Decoding in Neural Machine Translation

Mathias M\"uller, Rico Sennrich

PDF

1 Repo

TL;DR

This paper empirically examines Minimum Bayes Risk decoding in neural machine translation, revealing its biases and robustness improvements over traditional beam search, especially against copy noise and domain shifts.

Contribution

It provides a detailed analysis of MBR decoding's properties, highlighting its biases and robustness benefits in NMT.

Findings

01

MBR still exhibits length and token frequency biases due to utility metrics.

02

MBR increases robustness against copy noise in training data.

03

MBR improves robustness to domain shift.

Abstract

Neural Machine Translation (NMT) currently exhibits biases such as producing translations that are too short and overgenerating frequent words, and shows poor robustness to copy noise in training data or domain shift. Recent work has tied these shortcomings to beam search -- the de facto standard inference algorithm in NMT -- and Eikema & Aziz (2020) propose to use Minimum Bayes Risk (MBR) decoding on unbiased samples instead. In this paper, we empirically investigate the properties of MBR decoding on a number of previously reported biases and failure cases of beam search. We find that MBR still exhibits a length and token frequency bias, owing to the MT metrics used as utility functions, but that MBR also increases robustness against copy noise in the training data and domain shift.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ZurichNLP/understanding-mbr
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.