The Multi-Range Theory of Translation Quality Measurement: MQM scoring   models and Statistical Quality Control

Arle Lommel; Serge Gladkoff; Alan Melby; Sue Ellen Wright; Ingemar; Strandvik; Katerina Gasova; Angelika Vaasa; Andy Benzo; Romina Marazzato; Sparano; Monica Foresi; Johani Innis; Lifeng Han; Goran Nenadic

arXiv:2405.16969·cs.CL·August 6, 2024·6 cites

The Multi-Range Theory of Translation Quality Measurement: MQM scoring models and Statistical Quality Control

Arle Lommel, Serge Gladkoff, Alan Melby, Sue Ellen Wright, Ingemar, Strandvik, Katerina Gasova, Angelika Vaasa, Andy Benzo, Romina Marazzato, Sparano, Monica Foresi, Johani Innis, Lifeng Han, Goran Nenadic

PDF

Open Access

TL;DR

This paper reviews the 10-year development of the MQM framework for translation quality evaluation, introduces new scoring models, and advocates for statistical quality control for small sample sizes.

Contribution

It presents new linear and non-linear MQM scoring models and a universal approach for different sample sizes, emphasizing statistical quality control for small samples.

Findings

01

Introduction of the Linear Calibrated Scoring Model

02

Presentation of the Non-Linear Scoring Model

03

Advocacy for statistical quality control for small samples

Abstract

The year 2024 marks the 10th anniversary of the Multidimensional Quality Metrics (MQM) framework for analytic translation quality evaluation. The MQM error typology has been widely used by practitioners in the translation and localization industry and has served as the basis for many derivative projects. The annual Conference on Machine Translation (WMT) shared tasks on both human and automatic translation quality evaluations used the MQM error typology. The metric stands on two pillars: error typology and the scoring model. The scoring model calculates the quality score from annotation data, detailing how to convert error type and severity counts into numeric scores to determine if the content meets specifications. Previously, only the raw scoring model had been published. This April, the MQM Council published the Linear Calibrated Scoring Model, officially presented herein, along…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques