Language as a Latent Variable for Reasoning Optimization

Linjuan Wu; Haoran Wei; Jialong Tang; Shuang Luo; Baosong Yang; Yongliang Shen; Weiming Lu

arXiv:2604.21593·cs.CL·May 5, 2026

Language as a Latent Variable for Reasoning Optimization

Linjuan Wu, Haoran Wei, Jialong Tang, Shuang Luo, Baosong Yang, Yongliang Shen, Weiming Lu

PDF

TL;DR

This paper explores how language functions as a latent variable influencing reasoning in multilingual models, introducing a novel RL framework that enhances reasoning accuracy across languages without relying on chain-of-thought annotations.

Contribution

It proposes polyGRPO, a reinforcement learning method that leverages language variation as an exploration signal to improve multilingual reasoning performance.

Findings

01

Non-English responses often outperform English on reasoning tasks.

02

Unconstrained language conditions yield the best reasoning accuracy.

03

polyGRPO improves multilingual math problem accuracy by over 6%.

Abstract

As LLMs reduce English-centric bias, a surprising trend emerges: non-English responses sometimes outperform English on reasoning tasks. We hypothesize that language functions as a latent variable that structurally modulates the model's internal inference pathways, rather than merely serving as an output medium. To test this, we conducted a Polyglot Thinking Experiment, in which models were prompted to solve identical problems under language-constrained and language-unconstrained conditions. Results show that non-English responses often achieve higher accuracy, and the best performance frequently occur when language is unconstrained, suggesting that multilinguality broadens the model's latent reasoning space. Based on this insight, we propose polyGRPO (Polyglot Group Relative Policy Optimization), an RL framework that treats language variation as an implicit exploration signal. It…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.