Understanding and Mitigating Language Confusion in LLMs

Kelly Marchisio; Wei-Yin Ko; Alexandre B\'erard; Th\'eo Dehaze,; Sebastian Ruder

arXiv:2406.20052·cs.CL·April 7, 2025

Understanding and Mitigating Language Confusion in LLMs

Kelly Marchisio, Wei-Yin Ko, Alexandre B\'erard, Th\'eo Dehaze,, Sebastian Ruder

PDF

Open Access 1 Repo 9 Models 1 Video

TL;DR

This paper introduces the Language Confusion Benchmark to evaluate LLMs' ability to generate text in the correct language, revealing widespread confusion especially in complex prompts, and explores mitigation strategies.

Contribution

The paper presents the first comprehensive benchmark for language confusion in LLMs and analyzes factors affecting language accuracy, along with mitigation techniques.

Findings

01

LLMs often fail to generate text in the correct language.

02

Language confusion increases with prompt complexity and sampling temperature.

03

Few-shot prompting and multilingual fine-tuning reduce language confusion.

Abstract

We investigate a surprising limitation of LLMs: their inability to consistently generate text in a user's desired language. We create the Language Confusion Benchmark (LCB) to evaluate such failures, covering 15 typologically diverse languages with existing and newly-created English and multilingual prompts. We evaluate a range of LLMs on monolingual and cross-lingual generation reflecting practical use cases, finding that Llama Instruct and Mistral models exhibit high degrees of language confusion and even the strongest models fail to consistently respond in the correct language. We observe that base and English-centric instruct models are more prone to language confusion, which is aggravated by complex prompts and high sampling temperatures. We find that language confusion can be partially mitigated via few-shot prompting, multilingual SFT and preference tuning. We release our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

for-ai/language-confusion
noneOfficial

Models

Videos

Understanding and Mitigating Language Confusion in LLMs· underline

Taxonomy

TopicsInterpreting and Communication in Healthcare · Translation Studies and Practices

MethodsBalanced Selection · Shrink and Fine-Tune · LLaMA