Desert Camels and Oil Sheikhs: Arab-Centric Red Teaming of Frontier LLMs

Muhammed Saeed; Elgizouli Mohamed; Mukhtar Mohamed; Shaina Raza,; Muhammad Abdul-Mageed; Shady Shehata

arXiv:2410.24049·cs.CL·November 28, 2024

Desert Camels and Oil Sheikhs: Arab-Centric Red Teaming of Frontier LLMs

Muhammed Saeed, Elgizouli Mohamed, Mukhtar Mohamed, Shaina Raza,, Muhammad Abdul-Mageed, Shady Shehata

PDF

Open Access

TL;DR

This study evaluates biases against Arabs in various large language models and tests their safety against prompts that exaggerate negative traits, revealing significant biases and vulnerabilities across models.

Contribution

It introduces new datasets for bias and safety evaluation and provides a comparative analysis of multiple LLMs' biases and jailbreak vulnerabilities.

Findings

01

79% of bias cases favor Westerners over Arabs

02

GPT-4o is most vulnerable to jailbreak prompts

03

Claude 3.5 Sonnet is the safest model

Abstract

Large language models (LLMs) are widely used but raise ethical concerns due to embedded social biases. This study examines LLM biases against Arabs versus Westerners across eight domains, including women's rights, terrorism, and anti-Semitism and assesses model resistance to perpetuating these biases. To this end, we create two datasets: one to evaluate LLM bias toward Arabs versus Westerners and another to test model safety against prompts that exaggerate negative traits ("jailbreaks"). We evaluate six LLMs -- GPT-4, GPT-4o, LlaMA 3.1 (8B & 405B), Mistral 7B, and Claude 3.5 Sonnet. We find 79% of cases displaying negative biases toward Arabs, with LlaMA 3.1-405B being the most biased. Our jailbreak tests reveal GPT-4o as the most vulnerable, despite being an optimized version, followed by LlaMA 3.1-8B and Mistral 7B. All LLMs except Claude exhibit attack success rates above 87% in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnimal Diversity and Health Studies · African Studies and Geopolitics

MethodsAttention Is All You Need · Linear Layer · Dense Connections · Label Smoothing · Layer Normalization · Residual Connection · Position-Wise Feed-Forward Layer · Adam · Multi-Head Attention · Softmax