TransEvalnia: Reasoning-based Evaluation and Ranking of Translations
Richard Sproat, Tianyu Zhao, Llion Jones

TL;DR
TransEvalnia is a reasoning-based translation evaluation system that provides detailed quality assessments and ranking, outperforming existing methods and correlating well with human judgments across multiple language pairs.
Contribution
It introduces a prompting-based evaluation framework that uses reasoning to produce fine-grained scores and rankings, addressing position bias and demonstrating strong correlation with human ratings.
Findings
TransEvalnia matches or exceeds state-of-the-art MT-Ranker performance.
Evaluation scores strongly correlate with human judgments.
The system exhibits sensitivity to translation order, with proposed mitigation methods.
Abstract
We present TransEvalnia, a prompting-based translation evaluation and ranking system that uses reasoning in performing its evaluations and ranking. This system presents fine-grained evaluations based on a subset of the Multidimensional Quality Metrics (https://themqm.org/), returns an assessment of which translation it deems the best, and provides numerical scores for the various dimensions and for the overall translation. We show that TransEvalnia performs as well as or better than the state-of-the-art MT-Ranker (Moosa et al. 2024) on our own English-Japanese data as well as several language pairs from various WMT shared tasks. Using Anthropic's Claude-3.5-Sonnet and Qwen-2.5-72B-Instruct as the evaluation LLMs, we show that the evaluations returned are deemed highly acceptable to human raters, and that the scores assigned to the translations by Sonnet, as well as other LLMs, correlate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
