Guardians of the Machine Translation Meta-Evaluation: Sentinel Metrics   Fall In!

Stefano Perrella; Lorenzo Proietti; Alessandro Scir\`e; Edoardo Barba,; Roberto Navigli

arXiv:2408.13831·cs.CL·August 27, 2024

Guardians of the Machine Translation Meta-Evaluation: Sentinel Metrics Fall In!

Stefano Perrella, Lorenzo Proietti, Alessandro Scir\`e, Edoardo Barba,, Roberto Navigli

PDF

Open Access 1 Repo 6 Models 1 Video

TL;DR

This paper introduces sentinel metrics to critically evaluate the meta-evaluation process of machine translation metrics, revealing biases towards certain metric types and concerns about reliance on spurious correlations.

Contribution

It proposes sentinel metrics as a novel tool to scrutinize and improve the robustness and fairness of MT metric meta-evaluation frameworks.

Findings

01

Current meta-evaluation favors metrics trained to mimic human judgments

02

Continuous metrics are disproportionately ranked highly

03

Potential biases and reliance on spurious correlations are identified

Abstract

Annually, at the Conference of Machine Translation (WMT), the Metrics Shared Task organizers conduct the meta-evaluation of Machine Translation (MT) metrics, ranking them according to their correlation with human judgments. Their results guide researchers toward enhancing the next generation of metrics and MT systems. With the recent introduction of neural metrics, the field has witnessed notable advancements. Nevertheless, the inherent opacity of these metrics has posed substantial challenges to the meta-evaluation process. This work highlights two issues with the meta-evaluation framework currently employed in WMT, and assesses their impact on the metrics rankings. To do this, we introduce the concept of sentinel metrics, which are designed explicitly to scrutinize the meta-evaluation process's accuracy, robustness, and fairness. By employing sentinel metrics, we aim to validate our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sapienzanlp/guardians-mt-eval
pytorchOfficial

Models

Videos

Guardians of the Machine Translation Meta-Evaluation: Sentinel Metrics Fall In!· underline

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Biomedical Text Mining and Ontologies · Topic Modeling