Multilingual Blending: LLM Safety Alignment Evaluation with Language Mixture
Jiayang Song, Yuheng Huang, Zhehua Zhou, Lei Ma

TL;DR
This paper introduces Multilingual Blending, a novel evaluation scheme for assessing LLM safety alignment in complex multilingual scenarios, revealing significant vulnerabilities and the influence of linguistic properties on safety bypass rates.
Contribution
It proposes a new multilingual evaluation method for LLM safety, highlighting the impact of language features on safety alignment robustness in diverse linguistic contexts.
Findings
Multilingual Blending increases safety bypass rates significantly.
Languages with different morphology and from diverse families are more prone to evading safety measures.
Without careful prompts, safety vulnerabilities are amplified in multilingual settings.
Abstract
As safety remains a crucial concern throughout the development lifecycle of Large Language Models (LLMs), researchers and industrial practitioners have increasingly focused on safeguarding and aligning LLM behaviors with human preferences and ethical standards. LLMs, trained on extensive multilingual corpora, exhibit powerful generalization abilities across diverse languages and domains. However, current safety alignment practices predominantly focus on single-language scenarios, which leaves their effectiveness in complex multilingual contexts, especially for those complex mixed-language formats, largely unexplored. In this study, we introduce Multilingual Blending, a mixed-language query-response scheme designed to evaluate the safety alignment of various state-of-the-art LLMs (e.g., GPT-4o, GPT-3.5, Llama3) under sophisticated, multilingual conditions. We further investigate language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Text Readability and Simplification · Translation Studies and Practices
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Cosine Annealing · Linear Layer · Adam · Dropout · Weight Decay · Multi-Head Attention · Dense Connections
