Why Do Multilingual Reasoning Gaps Emerge in Reasoning Language Models?

Deokhyung Kang; Seonjeong Hwang; Daehui Kim; Hyounghun Kim; Gary Geunbae Lee

arXiv:2510.27269·cs.CL·April 14, 2026

Why Do Multilingual Reasoning Gaps Emerge in Reasoning Language Models?

Deokhyung Kang, Seonjeong Hwang, Daehui Kim, Hyounghun Kim, Gary Geunbae Lee

PDF

1 Repo 1 Models 1 Datasets

TL;DR

This paper identifies language understanding failures as the main cause of multilingual reasoning gaps in reasoning language models and proposes a selective translation method to mitigate this issue effectively.

Contribution

It demonstrates that understanding failures can be detected and mitigated through selective translation, significantly reducing the multilingual reasoning gap in models.

Findings

01

Understanding failures are detectable with supervised methods.

02

Selective translation reduces translation needs to about 20%.

03

The approach nearly closes the multilingual reasoning gap.

Abstract

Reasoning language models (RLMs) achieve strong performance on complex reasoning tasks, yet they still exhibit a multilingual reasoning gap, performing better in high-resource languages than in low-resource ones. While recent efforts have been made to address this gap, its underlying causes remain largely unexplored. In this work, we show that this gap primarily stems from failures in language understanding-specifically, the model's inability to translate multilingual inputs into the language dominating its reasoning traces (typically English). As identifying understanding failures can enable targeted mitigation of the gap, we evaluate a range of detection methods and find that understanding failures are detectable to a meaningful extent, with supervised approaches performing best. Building on this, we propose Selective Translation, a strategy that incorporates an English translation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

deokhk/RLM_analysis
github

Models

🤗
deokhk/mmbert_ft_understandability_Qwen3-4B
model

Datasets

deokhk/multilingual_reasoning_gap_outputs
dataset· 112 dl
112 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.