RIVAL: Reinforcement Learning with Iterative and Adversarial Optimization for Machine Translation

Tianjiao Li; Mengran Yu; Chenyu Shi; Yanjun Zhao; Xiaojing Liu; Qiang Zhang; Qi Zhang; Xuanjing Huang; Jiayin Wang

arXiv:2506.05070·cs.CL·August 6, 2025

RIVAL: Reinforcement Learning with Iterative and Adversarial Optimization for Machine Translation

Tianjiao Li, Mengran Yu, Chenyu Shi, Yanjun Zhao, Xiaojing Liu, Qiang Zhang, Qi Zhang, Xuanjing Huang, Jiayin Wang

PDF

Open Access

TL;DR

This paper introduces RIVAL, an adversarial training framework for machine translation that addresses reward model divergence issues in reinforcement learning, leading to improved translation quality especially for colloquial subtitles.

Contribution

RIVAL formulates translation as a min-max game between reward model and LLM, incorporating both qualitative and quantitative rewards for stable, effective training.

Findings

01

RIVAL outperforms baseline translation models in experiments.

02

Incorporating quantitative rewards stabilizes training.

03

Adversarial training improves translation quality for colloquial subtitles.

Abstract

Large language models (LLMs) possess strong multilingual capabilities, and combining Reinforcement Learning from Human Feedback (RLHF) with translation tasks has shown great potential. However, we observe that this paradigm performs unexpectedly poorly when applied to colloquial subtitle translation tasks. In this work, we investigate this issue and find that the offline reward model (RM) gradually diverges from the online LLM due to distributional shift, ultimately leading to undesirable training outcomes. To address this, we propose RIVAL, an adversarial training framework that formulates the process as a min-max game between the RM and the LLM. RIVAL iteratively updates the both models, with the RM trained to distinguish strong from weak translations (qualitative preference reward), and the LLM trained to enhance its translation for closing this gap. To stabilize training and improve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques