Error Span Annotation: A Balanced Approach for Human Evaluation of Machine Translation
Tom Kocmi, Vil\'em Zouhar, Eleftherios Avramidis, Roman Grundkiewicz,, Marzena Karpinska, Maja Popovi\'c, Mrinmaya Sachan, Mariya Shmatova

TL;DR
This paper introduces Error Span Annotation (ESA), a human evaluation method for machine translation that combines the speed of direct assessment with the detailed error analysis of MQM, achieving reliable results more efficiently.
Contribution
ESA is a novel human evaluation protocol that balances cost, speed, and accuracy by integrating DA and MQM features for MT assessment.
Findings
ESA is faster and cheaper than MQM.
ESA achieves comparable quality to MQM.
ESA does not require expert annotators.
Abstract
High-quality Machine Translation (MT) evaluation relies heavily on human judgments. Comprehensive error classification methods, such as Multidimensional Quality Metrics (MQM), are expensive as they are time-consuming and can only be done by experts, whose availability may be limited especially for low-resource languages. On the other hand, just assigning overall scores, like Direct Assessment (DA), is simpler and faster and can be done by translators of any level, but is less reliable. In this paper, we introduce Error Span Annotation (ESA), a human evaluation protocol which combines the continuous rating of DA with the high-level error severity span marking of MQM. We validate ESA by comparing it to MQM and DA for 12 MT systems and one human reference translation (English to German) from WMT23. The results show that ESA offers faster and cheaper annotations than MQM at the same quality…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
