Mechanistic Understanding and Mitigation of Language Confusion in English-Centric Large Language Models

Ercong Nie; Helmut Schmid; Hinrich Sch\"utze

arXiv:2505.16538·cs.CL·September 19, 2025

Mechanistic Understanding and Mitigation of Language Confusion in English-Centric Large Language Models

Ercong Nie, Helmut Schmid, Hinrich Sch\"utze

PDF

Open Access 1 Video

TL;DR

This paper investigates the internal mechanisms behind language confusion in English-centric large language models and proposes neuron-level interventions to reduce unintended language switches, improving multilingual performance.

Contribution

It introduces the first mechanistic interpretability study of language confusion, combining behavioral benchmarks with neuron analysis, and demonstrates effective neuron editing for mitigation.

Findings

01

Confusion points are central to language switching behavior.

02

Layer-wise analysis reveals final layer transition failures cause confusion.

03

Neuron editing significantly reduces language confusion while maintaining model performance.

Abstract

Language confusion -- where large language models (LLMs) generate unintended languages against the user's need -- remains a critical challenge, especially for English-centric models. We present the first mechanistic interpretability (MI) study of language confusion, combining behavioral benchmarking with neuron-level analysis. Using the Language Confusion Benchmark (LCB), we show that confusion points (CPs) -- specific positions where language switches occur -- are central to this phenomenon. Through layer-wise analysis with TunedLens and targeted neuron attribution, we reveal that transition failures in the final layers drive confusion. We further demonstrate that editing a small set of critical neurons, identified via comparative analysis with a multilingual-tuned counterpart, substantially mitigates confusion while largely preserving general competence and fluency. Our approach…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Mechanistic Understanding and Mitigation of Language Confusion in English-Centric Large Language Models· underline

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications

MethodsSparse Evolutionary Training