Digging Errors in NMT: Evaluating and Understanding Model Errors from   Partial Hypothesis Space

Jianhao Yan; Chenming Wu; Fandong Meng; Jie Zhou

arXiv:2106.15217·cs.CL·October 11, 2022

Digging Errors in NMT: Evaluating and Understanding Model Errors from Partial Hypothesis Space

Jianhao Yan, Chenming Wu, Fandong Meng, Jie Zhou

PDF

Open Access

TL;DR

This paper introduces a novel evaluation protocol for neural machine translation (NMT) that assesses model errors based on ranking hypotheses within the entire hypothesis space, revealing significant ranking issues in state-of-the-art models.

Contribution

It proposes a new evaluation framework and approximation methods to analyze NMT model errors beyond traditional metrics, highlighting limitations of current models and search algorithms.

Findings

01

Transformer models perform at chance level in top hypothesis ranking.

02

Model errors are strongly correlated with human judgment.

03

Search algorithms like beam search exhibit inductive biases affecting translation quality.

Abstract

Solid evaluation of neural machine translation (NMT) is key to its understanding and improvement. Current evaluation of an NMT system is usually built upon a heuristic decoding algorithm (e.g., beam search) and an evaluation metric assessing similarity between the translation and golden reference. However, this system-level evaluation framework is limited by evaluating only one best hypothesis and search errors brought by heuristic decoding algorithms. To better understand NMT models, we propose a novel evaluation protocol, which defines model errors with model's ranking capability over hypothesis space. To tackle the problem of exponentially large space, we propose two approximation methods, top region evaluation along with an exact top- $k$ decoding algorithm, which finds top-ranked hypotheses in the whole hypothesis space, and Monte Carlo sampling evaluation, which simulates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Software Engineering Research · Natural Language Processing Techniques

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Attention Is All You Need · Byte Pair Encoding · Adam · Layer Normalization · Dropout · Multi-Head Attention · Label Smoothing