Code-Switching Red-Teaming: LLM Evaluation for Safety and Multilingual Understanding

Haneul Yoo; Yongjin Yang; Hwaran Lee

arXiv:2406.15481·cs.AI·June 12, 2025·1 cites

Code-Switching Red-Teaming: LLM Evaluation for Safety and Multilingual Understanding

Haneul Yoo, Yongjin Yang, Hwaran Lee

PDF

Open Access 1 Repo 1 Datasets 1 Video

TL;DR

This paper introduces CSRT, a framework for generating code-switching queries to evaluate LLM safety and multilingual understanding, revealing that code-switching can effectively expose undesirable behaviors in LLMs.

Contribution

The paper presents a novel code-switching red-teaming framework, CSRT, that outperforms existing methods and explores multilingual capabilities and safety issues in LLMs.

Findings

01

CSRT achieves 46.7% more attack success than standard English attacks.

02

Effective in safety evaluation across multiple languages.

03

Demonstrates LLMs' multilingual understanding and generation capabilities.

Abstract

As large language models (LLMs) have advanced rapidly, concerns regarding their safety have become prominent. In this paper, we discover that code-switching in red-teaming queries can effectively elicit undesirable behaviors of LLMs, which are common practices in natural language. We introduce a simple yet effective framework, CSRT, to synthesize codeswitching red-teaming queries and investigate the safety and multilingual understanding of LLMs comprehensively. Through extensive experiments with ten state-of-the-art LLMs and code-switching queries combining up to 10 languages, we demonstrate that the CSRT significantly outperforms existing multilingual red-teaming techniques, achieving 46.7% more attacks than standard attacks in English and being effective in conventional safety domains. We also examine the multilingual ability of those LLMs to generate and understand codeswitching…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

haneul-yoo/csrt
noneOfficial

Datasets

walledai/CSRT
dataset· 20 dl
20 dl

Videos

Code-Switching Red-Teaming: LLM Evaluation for Safety and Multilingual Understanding· underline

Taxonomy

TopicsSemantic Web and Ontologies · Digital Rights Management and Security