TL;DR
This study introduces a detailed, statistically rigorous evaluation method for machine translation quality, focusing on English to Croatian, revealing neural MT's superior performance especially in complex linguistic phenomena.
Contribution
It develops a tailored MQM error taxonomy for Slavic languages and applies statistical analysis to compare different MT paradigms, highlighting neural MT's advantages.
Findings
Neural MT reduces errors by 54% compared to phrase-based systems.
Neural MT excels in long-distance agreement errors.
The tailored MQM taxonomy improves annotation accuracy.
Abstract
This paper presents a quantitative fine-grained manual evaluation approach to comparing the performance of different machine translation (MT) systems. We build upon the well-established Multidimensional Quality Metrics (MQM) error taxonomy and implement a novel method that assesses whether the differences in performance for MQM error types between different MT systems are statistically significant. We conduct a case study for English-to-Croatian, a language direction that involves translating into a morphologically rich language, for which we compare three MT systems belonging to different paradigms: pure phrase-based, factored phrase-based and neural. First, we design an MQM-compliant error taxonomy tailored to the relevant linguistic phenomena of Slavic languages, which made the annotation process feasible and accurate. Errors in MT outputs were then annotated by two annotators…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
