MT-Ranker: Reference-free machine translation evaluation by inter-system ranking
Ibraheem Muhammad Moosa, Rui Zhang, Wenpeng Yin

TL;DR
MT-Ranker introduces a reference-free, pairwise ranking approach for machine translation evaluation, outperforming existing methods by aligning better with human judgments without relying on human annotations.
Contribution
This work formulates reference-free MT evaluation as a pairwise ranking problem and demonstrates its effectiveness using indirect supervision, achieving state-of-the-art results.
Findings
Outperforms existing metrics on WMT benchmarks
Achieves state-of-the-art results on ACES benchmark
Operates without human annotations
Abstract
Traditionally, Machine Translation (MT) Evaluation has been treated as a regression problem -- producing an absolute translation-quality score. This approach has two limitations: i) the scores lack interpretability, and human annotators struggle with giving consistent scores; ii) most scoring methods are based on (reference, translation) pairs, limiting their applicability in real-world scenarios where references are absent. In practice, we often care about whether a new MT system is better or worse than some competitors. In addition, reference-free MT evaluation is increasingly practical and necessary. Unfortunately, these two practical considerations have yet to be jointly explored. In this work, we formulate the reference-free MT evaluation into a pairwise ranking problem. Given the source sentence and a pair of translations, our system predicts which translation is better. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
