XL-SafetyBench: A Country-Grounded Cross-Cultural Benchmark for LLM Safety and Cultural Sensitivity

Dasol Choi; Eugenia Kim; Jaewon Noh; Sang Seo; Eunmi Kim; Myunggyo Oh; Yunjin Park; Brigitta Jesica Kartono; Josef Pichlmeier; Helena Berndt; Sai Krishna Mendu; Glenn Johannes Tungka; \"Ozlem G\"ok\c{c}e; Suresh Gehlot; Katherine Pratt; Amanda Minnich; Haon Park

arXiv:2605.05662·cs.CL·May 8, 2026

XL-SafetyBench: A Country-Grounded Cross-Cultural Benchmark for LLM Safety and Cultural Sensitivity

Dasol Choi, Eugenia Kim, Jaewon Noh, Sang Seo, Eunmi Kim, Myunggyo Oh, Yunjin Park, Brigitta Jesica Kartono, Josef Pichlmeier, Helena Berndt, Sai Krishna Mendu, Glenn Johannes Tungka, \"Ozlem G\"ok\c{c}e, Suresh Gehlot, Katherine Pratt, Amanda Minnich, Haon Park

PDF

1 Repo 1 Datasets

TL;DR

XL-SafetyBench is a comprehensive multilingual benchmark with 5,500 test cases across 10 countries, designed to evaluate LLM safety and cultural sensitivity beyond English-centric assessments.

Contribution

It introduces a novel, multi-stage pipeline for creating culturally grounded safety test cases and provides metrics to distinguish genuine safety from generation failure.

Findings

01

Jailbreak robustness and cultural awareness are not strongly correlated in frontier models.

02

Local models show a near-linear trade-off between attack success rate and neutral-safe rate.

03

Current safety evaluations may conflate generation failure with genuine alignment.

Abstract

Current LLM safety benchmarks are predominantly English-centric and often rely on translation, failing to capture country-specific harms. Moreover, they rarely evaluate a model's ability to detect culturally embedded sensitivities as distinct from universal harms. We introduce XL-SafetyBench. a suite of 5,500 test cases across 10 country-language pairs, comprising a Jailbreak Benchmark of country-grounded adversarial prompts and a Cultural Benchmark where local sensitivities are embedded within innocuous requests. Each item is constructed via a multi-stage pipeline that combines LLM-assisted discovery, automated validation gates, and dual independent native-speaker annotators per country. To distinguish principled refusal from comprehension failure, we evaluate Attack Success Rate (ASR) alongside two complementary metrics we introduce: Neutral-Safe Rate (NSR) and Cultural Sensitivity…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aim-intelligence/XL-SafetyBench
github

Datasets

AIM-Intelligence/XL-SafetyBench
dataset· 714 dl
714 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.