Preliminary Ranking of WMT25 General Machine Translation Systems

Tom Kocmi; Eleftherios Avramidis; Rachel Bawden; Ond\v{r}ej Bojar; Konstantin Dranch; Anton Dvorkovich; Sergey Dukanov; Natalia Fedorova; Mark Fishel; Markus Freitag; Thamme Gowda; Roman Grundkiewicz; Barry Haddow; Marzena Karpinska; Philipp Koehn; Howard Lakougna; Jessica Lundin; Kenton Murray; Masaaki Nagata; Stefano Perrella; Lorenzo Proietti; Martin Popel; Maja Popovi\'c; Parker Riley; Mariya Shmatova; Stein\th\'or Steingr\'imsson; Lisa Yankovskaya; Vil\'em Zouhar

arXiv:2508.14909·cs.CL·August 26, 2025

Preliminary Ranking of WMT25 General Machine Translation Systems

Tom Kocmi, Eleftherios Avramidis, Rachel Bawden, Ond\v{r}ej Bojar, Konstantin Dranch, Anton Dvorkovich, Sergey Dukanov, Natalia Fedorova, Mark Fishel, Markus Freitag, Thamme Gowda, Roman Grundkiewicz, Barry Haddow, Marzena Karpinska, Philipp Koehn, Howard Lakougna

PDF

Open Access

TL;DR

This paper provides preliminary automatic rankings of WMT25 machine translation systems, highlighting potential biases and emphasizing that human evaluation will ultimately determine the official rankings.

Contribution

It offers early automatic evaluation results for WMT25 MT systems, aiding participants before official human-based rankings are released.

Findings

01

Preliminary rankings based on automatic metrics.

02

Bias towards re-ranking techniques in automatic evaluation.

03

Official rankings will rely on human evaluation.

Abstract

We present the preliminary rankings of machine translation (MT) systems submitted to the WMT25 General Machine Translation Shared Task, as determined by automatic evaluation metrics. Because these rankings are derived from automatic evaluation, they may exhibit a bias toward systems that employ re-ranking techniques, such as Quality Estimation or Minimum Bayes Risk decoding. The official WMT25 ranking will be based on human evaluation, which is more reliable and will supersede these results. The official WMT25 ranking will be based on human evaluation, which is more reliable and will supersede these results. The purpose of releasing these findings now is to assist task participants with their system description papers; not to provide final findings.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Text Readability and Simplification · Speech and dialogue systems