Language Mixing in Reasoning Language Models: Patterns, Impact, and Internal Causes

Mingyang Wang; Lukas Lange; Heike Adel; Yunpu Ma; Jannik Str\"otgen; Hinrich Sch\"utze

arXiv:2505.14815·cs.CL·September 22, 2025

Language Mixing in Reasoning Language Models: Patterns, Impact, and Internal Causes

Mingyang Wang, Lukas Lange, Heike Adel, Yunpu Ma, Jannik Str\"otgen, Hinrich Sch\"utze

PDF

Open Access 1 Datasets 1 Video

TL;DR

This paper systematically studies language mixing in reasoning language models, revealing how it affects performance, its internal causes, and how controlling reasoning languages can improve accuracy and interpretability.

Contribution

It provides the first comprehensive analysis of language mixing patterns, impacts, and internal causes in RLMs across multiple languages and tasks, and demonstrates how reasoning language choice influences performance.

Findings

01

Language mixing patterns vary with task difficulty and subject area.

02

Forcing reasoning in Latin or Han scripts improves model accuracy.

03

Internal representations align with script composition of reasoning traces.

Abstract

Reasoning language models (RLMs) excel at complex tasks by leveraging a chain-of-thought process to generate structured intermediate steps. However, language mixing, i.e., reasoning steps containing tokens from languages other than the prompt, has been observed in their outputs and shown to affect performance, though its impact remains debated. We present the first systematic study of language mixing in RLMs, examining its patterns, impact, and internal causes across 15 languages, 7 task difficulty levels, and 18 subject areas, and show how all three factors influence language mixing. Moreover, we demonstrate that the choice of reasoning language significantly affects performance: forcing models to reason in Latin or Han scripts via constrained decoding notably improves accuracy. Finally, we show that the script composition of reasoning traces closely aligns with that of the model's…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

mingyang26/knights-and-knaves-multilingual
dataset· 24 dl
24 dl

Videos

Language Mixing in Reasoning Language Models: Patterns, Impact, and Internal Causes· underline

Taxonomy

TopicsNatural Language Processing Techniques