Can Large Language Models Understand, Reason About, and Generate Code-Switched Text?

Genta Indra Winata; David Anugraha; Patrick Amadeus Irawan; Anirban Das; Haneul Yoo; Paresh Dashore; Shreyas Kulkarni; Ruochen Zhang; Haruki Sakajo; Frederikus Hudi; Anaelia Ovalle; Syrielle Montariol; Felix Gaschi; Michael Anugraha; Rutuj Ravindra Puranik; Zawad Hayat Ahmed; Adril Putra Merin; Emmanuele Chersoni

arXiv:2601.07153·cs.CL·January 13, 2026

Can Large Language Models Understand, Reason About, and Generate Code-Switched Text?

Genta Indra Winata, David Anugraha, Patrick Amadeus Irawan, Anirban Das, Haneul Yoo, Paresh Dashore, Shreyas Kulkarni, Ruochen Zhang, Haruki Sakajo, Frederikus Hudi, Anaelia Ovalle, Syrielle Montariol, Felix Gaschi, Michael Anugraha, Rutuj Ravindra Puranik, Zawad Hayat Ahmed

PDF

Open Access 1 Datasets

TL;DR

This paper evaluates large language models' abilities to understand, reason about, and generate code-switched text, revealing significant challenges and providing a new benchmark and insights for improving multilingual LLMs.

Contribution

It introduces CodeMixQA, a comprehensive benchmark for code-switched language understanding and generation, and analyzes LLM performance and limitations in this multilingual setting.

Findings

01

LLMs struggle with reasoning over code-switched text

02

Synthetic code-switched text generation has notable limitations

03

The benchmark reveals persistent challenges in multilingual LLMs

Abstract

Code-switching is a pervasive phenomenon in multilingual communication, yet the robustness of large language models (LLMs) in mixed-language settings remains insufficiently understood. In this work, we present a comprehensive evaluation of LLM capabilities in understanding, reasoning over, and generating code-switched text. We introduce CodeMixQA a novel benchmark with high-quality human annotations, comprising 16 diverse parallel code-switched language-pair variants that span multiple geographic regions and code-switching patterns, and include both original scripts and their transliterated forms. Using this benchmark, we analyze the reasoning behavior of LLMs on code-switched question-answering tasks, shedding light on how models process and reason over mixed-language inputs. We further conduct a systematic evaluation of LLM-generated synthetic code-switched text, focusing on both…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

gentaiscool/codemixqa
dataset· 44 dl
44 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification