Mending the Holes: Mitigating Reward Hacking in Reinforcement Learning for Multilingual Translation

Yifeng Liu; Siqi Ouyang; Yatish Hosmane Revanasiddappa; Lei Li

arXiv:2603.13045·cs.CL·March 16, 2026

Mending the Holes: Mitigating Reward Hacking in Reinforcement Learning for Multilingual Translation

Yifeng Liu, Siqi Ouyang, Yatish Hosmane Revanasiddappa, Lei Li

PDF

Open Access 3 Models

TL;DR

This paper introduces WALAR, a reinforcement learning method that uses only monolingual data to improve low-resource language translation in large language models, effectively addressing reward hacking issues and outperforming existing models.

Contribution

WALAR is a novel reinforcement learning approach that mitigates reward hacking in multilingual translation models using only monolingual data, enhancing low-resource language performance.

Findings

01

WALAR outperforms LLaMAX on 1400 language directions.

02

Mitigates reward hacking in multilingual RL training.

03

Supports translation for 101 languages.

Abstract

Large Language Models (LLMs) have demonstrated remarkable capability in machine translation on high-resource language pairs, yet their performance on low-resource translation still lags behind. Existing post-training methods rely heavily on high-quality parallel data, which are often scarce or unavailable for low-resource languages. In this paper, we introduce WALAR, a reinforcement training method using only monolingual text to elevate LLMs' translation capabilities on massive low-resource languages while retaining their performance on high-resource languages. Our key insight is based on the observation of failure modes (or "holes") in existing source-based multilingual quality estimation (QE) models. Reinforcement learning (RL) using these QE models tends to amplify such holes, resulting in poorer multilingual LLMs. We develop techniques including word alignment and language alignment…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification