Is MAP Decoding All You Need? The Inadequacy of the Mode in Neural Machine Translation
Bryan Eikema, Wilker Aziz

TL;DR
This paper argues that the common use of MAP decoding in neural machine translation is inadequate and that alternative decision rules considering the full translation distribution can better address NMT pathologies.
Contribution
The study demonstrates that many NMT issues stem from MAP decoding rather than the model itself, advocating for holistic decision rules like minimum Bayes risk decoding.
Findings
MAP decoding is insufficient for NMT quality.
Beam search deviates from true data statistics.
Alternative decision rules improve translation quality.
Abstract
Recent studies have revealed a number of pathologies of neural machine translation (NMT) systems. Hypotheses explaining these mostly suggest there is something fundamentally wrong with NMT as a model or its training algorithm, maximum likelihood estimation (MLE). Most of this evidence was gathered using maximum a posteriori (MAP) decoding, a decision rule aimed at identifying the highest-scoring translation, i.e. the mode. We argue that the evidence corroborates the inadequacy of MAP decoding more than casts doubt on the model and its training algorithm. In this work, we show that translation distributions do reproduce various statistics of the data well, but that beam search strays from such statistics. We show that some of the known pathologies and biases of NMT are due to MAP decoding and not to NMT's statistical assumptions nor MLE. In particular, we show that the most likely…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
