Quantitative Fine-Grained Human Evaluation of Machine Translation   Systems: a Case Study on English to Croatian

Filip Klubi\v{c}ka; Antonio Toral; V\'ictor M. S\'anchez-Cartagena

arXiv:1802.01451·cs.CL·February 13, 2018

Quantitative Fine-Grained Human Evaluation of Machine Translation Systems: a Case Study on English to Croatian

Filip Klubi\v{c}ka, Antonio Toral, V\'ictor M. S\'anchez-Cartagena

PDF

1 Repo

TL;DR

This study introduces a detailed, statistically rigorous evaluation method for machine translation quality, focusing on English to Croatian, revealing neural MT's superior performance especially in complex linguistic phenomena.

Contribution

It develops a tailored MQM error taxonomy for Slavic languages and applies statistical analysis to compare different MT paradigms, highlighting neural MT's advantages.

Findings

01

Neural MT reduces errors by 54% compared to phrase-based systems.

02

Neural MT excels in long-distance agreement errors.

03

The tailored MQM taxonomy improves annotation accuracy.

Abstract

This paper presents a quantitative fine-grained manual evaluation approach to comparing the performance of different machine translation (MT) systems. We build upon the well-established Multidimensional Quality Metrics (MQM) error taxonomy and implement a novel method that assesses whether the differences in performance for MQM error types between different MT systems are statistically significant. We conduct a case study for English-to-Croatian, a language direction that involves translating into a morphologically rich language, for which we compare three MT systems belonging to different paradigms: pure phrase-based, factored phrase-based and neural. First, we design an MQM-compliant error taxonomy tailored to the relevant linguistic phenomena of Slavic languages, which made the annotation process feasible and accurate. Errors in MT outputs were then annotated by two annotators…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

GreenParachute/mqm-eng-cro
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.