Mending the Holes: Mitigating Reward Hacking in Reinforcement Learning for Multilingual Translation
Yifeng Liu, Siqi Ouyang, Yatish Hosmane Revanasiddappa, Lei Li

TL;DR
This paper introduces WALAR, a reinforcement learning method that uses only monolingual data to improve low-resource language translation in large language models, effectively addressing reward hacking issues and outperforming existing models.
Contribution
WALAR is a novel reinforcement learning approach that mitigates reward hacking in multilingual translation models using only monolingual data, enhancing low-resource language performance.
Findings
WALAR outperforms LLaMAX on 1400 language directions.
Mitigates reward hacking in multilingual RL training.
Supports translation for 101 languages.
Abstract
Large Language Models (LLMs) have demonstrated remarkable capability in machine translation on high-resource language pairs, yet their performance on low-resource translation still lags behind. Existing post-training methods rely heavily on high-quality parallel data, which are often scarce or unavailable for low-resource languages. In this paper, we introduce WALAR, a reinforcement training method using only monolingual text to elevate LLMs' translation capabilities on massive low-resource languages while retaining their performance on high-resource languages. Our key insight is based on the observation of failure modes (or "holes") in existing source-based multilingual quality estimation (QE) models. Reinforcement learning (RL) using these QE models tends to amplify such holes, resulting in poorer multilingual LLMs. We develop techniques including word alignment and language alignment…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
