TAPO: Translation Augmented Policy Optimization for Multilingual Mathematical Reasoning

Xu Huang; Zhejian Lai; Zixian Huang; Jiajun Chen; Shujian Huang

arXiv:2603.25419·cs.CL·March 27, 2026

TAPO: Translation Augmented Policy Optimization for Multilingual Mathematical Reasoning

Xu Huang, Zhejian Lai, Zixian Huang, Jiajun Chen, Shujian Huang

PDF

Open Access

TL;DR

TAPO is a reinforcement learning framework that enhances multilingual mathematical reasoning in LLMs by integrating translation quality and explicit alignment strategies, significantly improving performance across languages.

Contribution

Introduces TAPO, a novel reinforcement learning approach that combines translation quality rewards with an understand-then-reason paradigm for multilingual reasoning.

Findings

01

Outperforms baseline methods in multilingual reasoning tasks

02

Generalizes well to unseen languages and out-of-domain tasks

03

Effectively integrates translation and reasoning capabilities

Abstract

Large Language Models (LLMs) have demonstrated remarkable proficiency in English mathematical reasoning, yet a significant performance disparity persists in multilingual contexts, largely attributed to deficiencies in language understanding. To bridge this gap, we introduce Translation-Augmented Policy Optimization (TAPO), a novel reinforcement learning framework built upon GRPO. TAPO enforces an explicit alignment strategy where the model leverages English as a pivot and follows an understand-then-reason paradigm. Crucially, we employ a step-level relative advantage mechanism that decouples understanding from reasoning, allowing the integration of translation quality rewards without introducing optimization conflicts. Extensive experiments reveal that TAPO effectively synergizes language understanding with reasoning capabilities and is compatible with various models. It outperforms…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Topic Modeling