Multilingual Blending: LLM Safety Alignment Evaluation with Language   Mixture

Jiayang Song; Yuheng Huang; Zhehua Zhou; Lei Ma

arXiv:2407.07342·cs.CL·July 11, 2024·1 cites

Multilingual Blending: LLM Safety Alignment Evaluation with Language Mixture

Jiayang Song, Yuheng Huang, Zhehua Zhou, Lei Ma

PDF

Open Access

TL;DR

This paper introduces Multilingual Blending, a novel evaluation scheme for assessing LLM safety alignment in complex multilingual scenarios, revealing significant vulnerabilities and the influence of linguistic properties on safety bypass rates.

Contribution

It proposes a new multilingual evaluation method for LLM safety, highlighting the impact of language features on safety alignment robustness in diverse linguistic contexts.

Findings

01

Multilingual Blending increases safety bypass rates significantly.

02

Languages with different morphology and from diverse families are more prone to evading safety measures.

03

Without careful prompts, safety vulnerabilities are amplified in multilingual settings.

Abstract

As safety remains a crucial concern throughout the development lifecycle of Large Language Models (LLMs), researchers and industrial practitioners have increasingly focused on safeguarding and aligning LLM behaviors with human preferences and ethical standards. LLMs, trained on extensive multilingual corpora, exhibit powerful generalization abilities across diverse languages and domains. However, current safety alignment practices predominantly focus on single-language scenarios, which leaves their effectiveness in complex multilingual contexts, especially for those complex mixed-language formats, largely unexplored. In this study, we introduce Multilingual Blending, a mixed-language query-response scheme designed to evaluate the safety alignment of various state-of-the-art LLMs (e.g., GPT-4o, GPT-3.5, Llama3) under sophisticated, multilingual conditions. We further investigate language…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Text Readability and Simplification · Translation Studies and Practices

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Cosine Annealing · Linear Layer · Adam · Dropout · Weight Decay · Multi-Head Attention · Dense Connections