Preliminary WMT24 Ranking of General MT Systems and LLMs
Tom Kocmi, Eleftherios Avramidis, Rachel Bawden, Ondrej Bojar, Anton, Dvorkovich, Christian Federmann, Mark Fishel, Markus Freitag, Thamme Gowda,, Roman Grundkiewicz, Barry Haddow, Marzena Karpinska, Philipp Koehn, Benjamin, Marie, Kenton Murray, Masaaki Nagata, Martin Popel

TL;DR
This paper presents a preliminary automatic ranking of WMT24 general machine translation systems and large language models, serving as an early benchmark before the final human evaluation.
Contribution
It provides an initial automatic ranking of MT systems for WMT24, aiding participants before the official human evaluation results are available.
Findings
Preliminary automatic rankings are established.
Human evaluation will supersede automatic metrics.
Results aim to assist system development.
Abstract
This is the preliminary ranking of WMT24 General MT systems based on automatic metrics. The official ranking will be a human evaluation, which is superior to the automatic ranking and supersedes it. The purpose of this report is not to interpret any findings but only provide preliminary results to the participants of the General MT task that may be useful during the writing of the system submission.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Research in Systems and Signal Processing
