Language Confusion Gate: Language-Aware Decoding Through Model Self-Distillation
Collin Zhang, Fei Huang, Chenhan Yuan, Junyang Lin

TL;DR
The paper proposes the Language Confusion Gate, a lightweight decoding filter that reduces language mixing in large language models without retraining, using self-distillation to selectively mask tokens based on language predictions.
Contribution
It introduces a novel, plug-in decoding method that effectively mitigates language confusion in LLMs without retraining or affecting task performance.
Findings
Significantly reduces language confusion across various models.
Does not negatively impact task performance.
Operates as a lightweight, plug-in solution.
Abstract
Large language models (LLMs) often experience language confusion, which is the unintended mixing of languages during text generation. Current solutions to this problem either necessitate model retraining or cannot differentiate between harmful confusion and acceptable code-switching. This paper introduces the Language Confusion Gate (LCG), a lightweight, plug-in solution that filters tokens during decoding without altering the base LLM. The LCG is trained using norm-adjusted self-distillation to predict appropriate language families and apply masking only when needed. Our method is based on the findings that language confusion is infrequent, correct-language tokens are usually among the top predictions, and output token embedding norms are larger for high-resource languages, which biases sampling. When evaluated across various models, including Qwen3, GPT-OSS, Gemma3, Llama3.1, LCG…
Peer Reviews
Decision·ICLR 2026 Poster
1. Authors propose an effective fix on the problem of language confusion based on reliable observations. 2. Method is simple and intuitive. 3. That the method does not require modifying the base model alleviates the concern of the potential forgetting happening.
1. In section 5.4, there is slight performance degradation on Qwen series after adjusting for LCG (no degradation on GPT-OSS though). Further error analysis on this matter is needed. For example, some qualitative analysis on some reasoning traces: does LCG suppress the diversity/exploration in language models? 2. In light of the above, more experiments could be done evaluating general capabilities after applying LCG.
1. The proposed LCG is a practical and lightweight solution, avoiding the need for model retraining and adding minimal computational overhead. 2. The norm-adjusted self-distillation method effectively mitigates the bias towards high-resource languages caused by token embedding norm imbalance. 3. The work distinguishes between harmful language confusion and legitimate code-switching, ensuring the model retains necessary multilingual expression capabilities.
1. LCG only classifies tokens into broad language families (CJ, Latin, Symbols, Low-Res), failing to resolve confusion between languages sharing the same script (e.g., Chinese vs. Japanese). 2. The evaluation’s reliance on rule-based detectors for language confusion may have limitations, especially in complex multilingual contexts. 3. The paper does not discuss the generalization of LCG to more low-resource languages beyond the tested ones.
1. The paper introduces a norm-adjusted self-distillation training signal for a language-family gate. I consider it as a very simple idea distinct from weight edits or reward finetuning. 2. The paper demonstrates empirical gains of the proposed method across open and closed models.
1. The paper calculates “code-switch rate” on FLORES-WITH-LATIN but does not include human judgments or llm-as-a-judge on the responses. It could offer more evaluation on whether the improved responses are also preferred by humans. 2. Are there cases that the proposed method do not help to correct the language confusion and could even increase the problem? It would be helpful to showcase some negative cases as well.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Adversarial Robustness in Machine Learning
