Otem&Utem: Over- and Under-Translation Evaluation Metric for NMT
Jing Yang, Biao Zhang, Yue Qin, Xiangwen Zhang, Qian Lin, Jinsong, Su

TL;DR
This paper introduces Otem and Utem, two automatic metrics for evaluating over- and under-translation in neural machine translation, showing they correlate well with human judgments and reveal limitations of traditional metrics like BLEU.
Contribution
The paper proposes novel quantitative metrics, Otem and Utem, specifically designed to assess over- and under-translation in NMT, addressing limitations of existing evaluation methods.
Findings
Otem and Utem strongly correlate with human evaluations.
They reveal inconsistencies between BLEU scores and translation quality.
Metrics effectively evaluate over- and under-translation issues.
Abstract
Although neural machine translation(NMT) yields promising translation performance, it unfortunately suffers from over- and under-translation is- sues [Tu et al., 2016], of which studies have become research hotspots in NMT. At present, these studies mainly apply the dominant automatic evaluation metrics, such as BLEU, to evaluate the overall translation quality with respect to both adequacy and uency. However, they are unable to accurately measure the ability of NMT systems in dealing with the above-mentioned issues. In this paper, we propose two quantitative metrics, the Otem and Utem, to automatically evaluate the system perfor- mance in terms of over- and under-translation respectively. Both metrics are based on the proportion of mismatched n-grams between gold ref- erence and system translation. We evaluate both metrics by comparing their scores with human evaluations, where the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
