SSR-Zero: Simple Self-Rewarding Reinforcement Learning for Machine Translation

Wenjie Yang; Mao Zheng; Mingyang Song; Zheng Li; Sitong Wang

arXiv:2505.16637·cs.CL·April 28, 2026

SSR-Zero: Simple Self-Rewarding Reinforcement Learning for Machine Translation

Wenjie Yang, Mao Zheng, Mingyang Song, Zheng Li, Sitong Wang

PDF

2 Repos 2 Models

TL;DR

SSR-Zero introduces a reference-free, self-rewarding reinforcement learning framework for machine translation, achieving state-of-the-art results with fully online training on monolingual data.

Contribution

It presents the first fully self-rewarding, reference-free RL method for MT that surpasses existing models and can be combined with external supervision for further improvements.

Findings

01

SSR-Zero outperforms existing MT-specific LLMs and larger general LLMs in English-Chinese translation.

02

Augmenting SSR with external supervision from COMET yields state-of-the-art performance.

03

Self-rewarding mechanism is more effective than external LLM-as-a-judge in MT.

Abstract

Large language models (LLMs) have recently demonstrated remarkable capabilities in machine translation (MT). However, most advanced MT-specific LLMs heavily rely on external supervision signals during training, such as human-annotated reference data or trained reward models (RMs), which are often expensive to obtain and challenging to scale. To overcome this limitation, we propose a Simple Self-Rewarding (SSR) Reinforcement Learning (RL) framework for MT that is reference-free, fully online, and relies solely on self-judging rewards. Training with SSR using 13K monolingual examples and Qwen-2.5-7B as the backbone, our model SSR-Zero-7B outperforms existing MT-specific LLMs, e.g., TowerInstruct-13B and GemmaX-28-9B, as well as larger general LLMs like Qwen2.5-32B-Instruct in English $\leftrightarrow$ Chinese translation tasks from WMT23, WMT24, and Flores200 benchmarks. Furthermore, by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.