Language Mixing in Reasoning Language Models: Patterns, Impact, and Internal Causes
Mingyang Wang, Lukas Lange, Heike Adel, Yunpu Ma, Jannik Str\"otgen, Hinrich Sch\"utze

TL;DR
This paper systematically studies language mixing in reasoning language models, revealing how it affects performance, its internal causes, and how controlling reasoning languages can improve accuracy and interpretability.
Contribution
It provides the first comprehensive analysis of language mixing patterns, impacts, and internal causes in RLMs across multiple languages and tasks, and demonstrates how reasoning language choice influences performance.
Findings
Language mixing patterns vary with task difficulty and subject area.
Forcing reasoning in Latin or Han scripts improves model accuracy.
Internal representations align with script composition of reasoning traces.
Abstract
Reasoning language models (RLMs) excel at complex tasks by leveraging a chain-of-thought process to generate structured intermediate steps. However, language mixing, i.e., reasoning steps containing tokens from languages other than the prompt, has been observed in their outputs and shown to affect performance, though its impact remains debated. We present the first systematic study of language mixing in RLMs, examining its patterns, impact, and internal causes across 15 languages, 7 task difficulty levels, and 18 subject areas, and show how all three factors influence language mixing. Moreover, we demonstrate that the choice of reasoning language significantly affects performance: forcing models to reason in Latin or Han scripts via constrained decoding notably improves accuracy. Finally, we show that the script composition of reasoning traces closely aligns with that of the model's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques
