Error Span Annotation: A Balanced Approach for Human Evaluation of   Machine Translation

Tom Kocmi; Vil\'em Zouhar; Eleftherios Avramidis; Roman Grundkiewicz,; Marzena Karpinska; Maja Popovi\'c; Mrinmaya Sachan; Mariya Shmatova

arXiv:2406.11580·cs.CL·October 21, 2024

Error Span Annotation: A Balanced Approach for Human Evaluation of Machine Translation

Tom Kocmi, Vil\'em Zouhar, Eleftherios Avramidis, Roman Grundkiewicz,, Marzena Karpinska, Maja Popovi\'c, Mrinmaya Sachan, Mariya Shmatova

PDF

Open Access 2 Repos

TL;DR

This paper introduces Error Span Annotation (ESA), a human evaluation method for machine translation that combines the speed of direct assessment with the detailed error analysis of MQM, achieving reliable results more efficiently.

Contribution

ESA is a novel human evaluation protocol that balances cost, speed, and accuracy by integrating DA and MQM features for MT assessment.

Findings

01

ESA is faster and cheaper than MQM.

02

ESA achieves comparable quality to MQM.

03

ESA does not require expert annotators.

Abstract

High-quality Machine Translation (MT) evaluation relies heavily on human judgments. Comprehensive error classification methods, such as Multidimensional Quality Metrics (MQM), are expensive as they are time-consuming and can only be done by experts, whose availability may be limited especially for low-resource languages. On the other hand, just assigning overall scores, like Direct Assessment (DA), is simpler and faster and can be done by translators of any level, but is less reliable. In this paper, we introduce Error Span Annotation (ESA), a human evaluation protocol which combines the continuous rating of DA with the high-level error severity span marking of MQM. We validate ESA by comparing it to MQM and DA for 12 MT systems and one human reference translation (English to German) from WMT23. The results show that ESA offers faster and cheaper annotations than MQM at the same quality…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques