Digging Errors in NMT: Evaluating and Understanding Model Errors from Partial Hypothesis Space
Jianhao Yan, Chenming Wu, Fandong Meng, Jie Zhou

TL;DR
This paper introduces a novel evaluation protocol for neural machine translation (NMT) that assesses model errors based on ranking hypotheses within the entire hypothesis space, revealing significant ranking issues in state-of-the-art models.
Contribution
It proposes a new evaluation framework and approximation methods to analyze NMT model errors beyond traditional metrics, highlighting limitations of current models and search algorithms.
Findings
Transformer models perform at chance level in top hypothesis ranking.
Model errors are strongly correlated with human judgment.
Search algorithms like beam search exhibit inductive biases affecting translation quality.
Abstract
Solid evaluation of neural machine translation (NMT) is key to its understanding and improvement. Current evaluation of an NMT system is usually built upon a heuristic decoding algorithm (e.g., beam search) and an evaluation metric assessing similarity between the translation and golden reference. However, this system-level evaluation framework is limited by evaluating only one best hypothesis and search errors brought by heuristic decoding algorithms. To better understand NMT models, we propose a novel evaluation protocol, which defines model errors with model's ranking capability over hypothesis space. To tackle the problem of exponentially large space, we propose two approximation methods, top region evaluation along with an exact top- decoding algorithm, which finds top-ranked hypotheses in the whole hypothesis space, and Monte Carlo sampling evaluation, which simulates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Software Engineering Research · Natural Language Processing Techniques
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Attention Is All You Need · Byte Pair Encoding · Adam · Layer Normalization · Dropout · Multi-Head Attention · Label Smoothing
