MQM Re-Annotation: A Technique for Collaborative Evaluation of Machine Translation

Parker Riley; Daniel Deutsch; Mara Finkelstein; Colten DiIanni; Juraj Juraska; Markus Freitag

arXiv:2510.24664·cs.CL·October 29, 2025

MQM Re-Annotation: A Technique for Collaborative Evaluation of Machine Translation

Parker Riley, Daniel Deutsch, Mara Finkelstein, Colten DiIanni, Juraj Juraska, Markus Freitag

PDF

TL;DR

This paper introduces MQM re-annotation, a collaborative evaluation method for machine translation that improves annotation quality by allowing reviewers to edit existing annotations, leading to more accurate assessments.

Contribution

It proposes a two-stage MQM re-annotation process that enhances annotation quality through collaborative editing and error correction, advancing evaluation methods for machine translation.

Findings

01

Re-annotation aligns with evaluation goals.

02

Higher-quality annotations achieved.

03

Errors previously missed are identified.

Abstract

Human evaluation of machine translation is in an arms race with translation model quality: as our models get better, our evaluation methods need to be improved to ensure that quality gains are not lost in evaluation noise. To this end, we experiment with a two-stage version of the current state-of-the-art translation evaluation paradigm (MQM), which we call MQM re-annotation. In this setup, an MQM annotator reviews and edits a set of pre-existing MQM annotations, that may have come from themselves, another human annotator, or an automatic MQM annotation system. We demonstrate that rater behavior in re-annotation aligns with our goals, and that re-annotation results in higher-quality annotations, mostly due to finding errors that were missed during the first pass.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.