IndicJR: A Judge-Free Benchmark of Jailbreak Robustness in South Asian Languages
Priyaranjan Pattnayak, Sanchari Chowdhuri

TL;DR
This paper introduces Indic Jailbreak Robustness (IJR), a comprehensive benchmark assessing multilingual jailbreak vulnerabilities in South Asian languages, revealing significant risks in safety evaluations that focus solely on English.
Contribution
The paper presents the first judge-free, multilingual benchmark for jailbreak robustness across 12 South Asian languages, highlighting systematic vulnerabilities in current safety assessments.
Findings
Contracts increase refusal rates but do not prevent jailbreaks.
English-to-Indic attack transfer is strong, with format wrappers often outperforming instruction wrappers.
Romanized inputs reduce jailbreak success rates, correlated with tokenization effects.
Abstract
Safety alignment of large language models (LLMs) is mostly evaluated in English and contract-bound, leaving multilingual vulnerabilities understudied. We introduce \textbf{Indic Jailbreak Robustness (IJR)}, a judge-free benchmark for adversarial safety across 12 Indic and South Asian languages (2.1 Billion speakers), covering 45216 prompts in JSON (contract-bound) and Free (naturalistic) tracks. IJR reveals three patterns. (1) Contracts inflate refusals but do not stop jailbreaks: in JSON, LLaMA and Sarvam exceed 0.92 JSR, and in Free all models reach 1.0 with refusals collapsing. (2) English to Indic attacks transfer strongly, with format wrappers often outperforming instruction wrappers. (3) Orthography matters: romanized or mixed inputs reduce JSR under JSON, with correlations to romanization share and tokenization (approx 0.28 to 0.32) indicating systematic effects. Human audits…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Topic Modeling
