LANG: Reinforcement Learning for Multilingual Reasoning with Language-Adaptive Hint Guidance
Yuchun Fan, Bei Li, Peiguang Li, Yilin Wang, Yongyu Mu, Jian Yang, Xin Chen, Rongxiang Weng, Jingang Wang, Xunliang Cai, Jingbo Zhu, Tong Xiao

TL;DR
The paper introduces LANG, a reinforcement learning framework that improves multilingual reasoning in large language models by using language-conditioned hints and adaptive strategies to balance reasoning quality and language consistency.
Contribution
LANG is a novel framework that employs language-adaptive hint guidance and decay schedules to enhance multilingual reasoning without losing language fidelity.
Findings
LANG significantly improves reasoning performance on multilingual benchmarks.
The framework maintains language consistency while boosting reasoning accuracy.
LANG generalizes beyond mathematics to improve language alignment across model layers.
Abstract
Reinforcement learning has proven effective for enhancing multi-step reasoning in large language models (LLMs), yet its benefits have not fully translated to multilingual contexts. Existing methods struggle with a fundamental trade-off: prioritizing input-language consistency severely hampers reasoning quality, while prioritizing reasoning often leads to unintended language drift toward English. We address this challenge with LANG, a novel framework that leverages language-conditioned hints to guide exploration in non-English reasoning tasks. Our method incorporates two key mechanisms to prevent dependency on these hints: a progressive decay schedule that gradually withdraws scaffolding, and a language-adaptive switch that tailors learning horizons to specific language difficulties. Empirical results on challenging multilingual mathematical benchmarks reveal that LANG substantially…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
