Enhancing Human Evaluation in Machine Translation with Comparative   Judgment

Yixiao Song; Parker Riley; Daniel Deutsch; Markus Freitag

arXiv:2502.17797·cs.CL·February 26, 2025

Enhancing Human Evaluation in Machine Translation with Comparative Judgment

Yixiao Song, Parker Riley, Daniel Deutsch, Markus Freitag

PDF

Open Access

TL;DR

This paper investigates comparative judgment methods to improve human evaluation consistency and efficiency in machine translation, demonstrating that pairwise approaches yield higher agreement and better error detection than traditional point-wise methods.

Contribution

It introduces and evaluates three annotation setups, showing that pairwise comparative judgments improve inter-annotator agreement and error marking consistency in MT evaluation.

Findings

01

SxS settings achieve higher inter-annotator agreement than MQM

02

SxS MQM improves error marking consistency by up to 38.5%

03

SxS RR provides a more efficient evaluation alternative

Abstract

Human evaluation is crucial for assessing rapidly evolving language models but is influenced by annotator proficiency and task design. This study explores the integration of comparative judgment into human annotation for machine translation (MT) and evaluates three annotation setups-point-wise Multidimensional Quality Metrics (MQM), side-by-side (SxS) MQM, and its simplified version SxS relative ranking (RR). In MQM, annotators mark error spans with categories and severity levels. SxS MQM extends MQM to pairwise error annotation for two translations of the same input, while SxS RR focuses on selecting the better output without labeling errors. Key findings are: (1) the SxS settings achieve higher inter-annotator agreement than MQM; (2) SxS MQM enhances inter-translation error marking consistency compared to MQM by, on average, 38.5% for explicitly compared MT systems and 19.5% for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques