Guardians of the Machine Translation Meta-Evaluation: Sentinel Metrics Fall In!
Stefano Perrella, Lorenzo Proietti, Alessandro Scir\`e, Edoardo Barba,, Roberto Navigli

TL;DR
This paper introduces sentinel metrics to critically evaluate the meta-evaluation process of machine translation metrics, revealing biases towards certain metric types and concerns about reliance on spurious correlations.
Contribution
It proposes sentinel metrics as a novel tool to scrutinize and improve the robustness and fairness of MT metric meta-evaluation frameworks.
Findings
Current meta-evaluation favors metrics trained to mimic human judgments
Continuous metrics are disproportionately ranked highly
Potential biases and reliance on spurious correlations are identified
Abstract
Annually, at the Conference of Machine Translation (WMT), the Metrics Shared Task organizers conduct the meta-evaluation of Machine Translation (MT) metrics, ranking them according to their correlation with human judgments. Their results guide researchers toward enhancing the next generation of metrics and MT systems. With the recent introduction of neural metrics, the field has witnessed notable advancements. Nevertheless, the inherent opacity of these metrics has posed substantial challenges to the meta-evaluation process. This work highlights two issues with the meta-evaluation framework currently employed in WMT, and assesses their impact on the metrics rankings. To do this, we introduce the concept of sentinel metrics, which are designed explicitly to scrutinize the meta-evaluation process's accuracy, robustness, and fairness. By employing sentinel metrics, we aim to validate our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗sapienzanlp/sentinel-cand-mqmmodel· 3 dl· ♡ 43 dl♡ 4
- 🤗sapienzanlp/sentinel-src-mqmmodel· 5 dl· ♡ 35 dl♡ 3
- 🤗sapienzanlp/sentinel-ref-mqmmodel· 12 dl· ♡ 412 dl♡ 4
- 🤗sapienzanlp/sentinel-cand-damodel· 3 dl· ♡ 43 dl♡ 4
- 🤗sapienzanlp/sentinel-src-damodel· 3 dl· ♡ 53 dl♡ 5
- 🤗sapienzanlp/sentinel-ref-damodel· 3 dl· ♡ 43 dl♡ 4
Videos
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Biomedical Text Mining and Ontologies · Topic Modeling
