Remedy-R: Generative Reasoning for Machine Translation Evaluation without Error Annotations
Shaomu Tan, Ryosuke Mitani, Ritvik Choudhary, Qiyu Wu, Toshiyuki Sekiya, Christof Monz

TL;DR
Remedy-R is a novel, interpretable generative metric for machine translation evaluation that uses reasoning and reinforcement learning to assess translations without error annotations, showing strong performance and practical utility.
Contribution
It introduces Remedy-R, a reasoning-based MT evaluation metric trained without error-span annotations, capable of providing step-by-step analysis and improving translation quality through a feedback loop.
Findings
Remedy-R achieves competitive performance with top metrics and GPT-4 judges.
It generalizes well to other languages and out-of-distribution data.
The Remedy-R Agent improves translation quality across various models.
Abstract
Over the years, automatic MT metrics have hillclimbed benchmarks and presented strong and sometimes human-level agreement with human ratings. Yet they remain black-box, offering little insight into their decision-making and often failing under real-world out-of-distribution (OOD) inputs. We introduce Remedy-R, a reasoning-driven generative MT metric trained with reinforcement learning from pairwise translation preferences, without requiring error-span annotations or distillation from closed LLMs. Remedy-R produces step-by-step analyses of accuracy, fluency, and completeness, followed by a final score, enabling more interpretable assessments. With only 60K training pairs across two language pairs, Remedy-R remains competitive with top scalar metrics and GPT-4-based judges on WMT22-24 meta-evaluation, generalizes to other languages, and exhibits strong robustness on OOD stress tests.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Artificial Intelligence in Healthcare and Education
