Jailbreaking LLMs with Arabic Transliteration and Arabizi

Mansour Al Ghanim; Saleh Almohaimeed; Mengxin Zheng; Yan Solihin; Qian; Lou

arXiv:2406.18725·cs.LG·October 4, 2024·1 cites

Jailbreaking LLMs with Arabic Transliteration and Arabizi

Mansour Al Ghanim, Saleh Almohaimeed, Mengxin Zheng, Yan Solihin, Qian, Lou

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper explores how Arabic transliteration and Arabizi can be used to bypass safety measures in large language models, revealing vulnerabilities that are less apparent in standard Arabic prompts.

Contribution

It demonstrates that Arabic transliteration and Arabizi can effectively jailbreak LLMs, highlighting a new avenue for assessing and improving model safety across diverse language forms.

Findings

01

Unsafe content generated with transliteration and Arabizi

02

Standard Arabic prompts remained safe against manipulation

03

Models show increased vulnerability with non-standard language forms

Abstract

This study identifies the potential vulnerabilities of Large Language Models (LLMs) to 'jailbreak' attacks, specifically focusing on the Arabic language and its various forms. While most research has concentrated on English-based prompt manipulation, our investigation broadens the scope to investigate the Arabic language. We initially tested the AdvBench benchmark in Standardized Arabic, finding that even with prompt manipulation techniques like prefix injection, it was insufficient to provoke LLMs into generating unsafe content. However, when using Arabic transliteration and chatspeak (or arabizi), we found that unsafe content could be produced on platforms like OpenAI GPT-4 and Anthropic Claude 3 Sonnet. Our findings suggest that using Arabic and its various forms could expose information that might remain hidden, potentially increasing the risk of jailbreak attacks. We hypothesize…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

securedl/arabic_jailbreak
pytorchOfficial

Videos

Jailbreaking LLMs with Arabic Transliteration and Arabizi· underline

Taxonomy

TopicsNatural Language Processing Techniques · Library Science and Information Systems · Mathematics, Computing, and Information Processing

MethodsAttention Is All You Need · Softmax · Layer Normalization · Absolute Position Encodings · Byte Pair Encoding · Label Smoothing · Position-Wise Feed-Forward Layer · Dropout · Adam · Linear Layer