IndicJR: A Judge-Free Benchmark of Jailbreak Robustness in South Asian Languages

Priyaranjan Pattnayak; Sanchari Chowdhuri

arXiv:2602.16832·cs.AI·February 20, 2026

IndicJR: A Judge-Free Benchmark of Jailbreak Robustness in South Asian Languages

Priyaranjan Pattnayak, Sanchari Chowdhuri

PDF

Open Access 1 Video

TL;DR

This paper introduces Indic Jailbreak Robustness (IJR), a comprehensive benchmark assessing multilingual jailbreak vulnerabilities in South Asian languages, revealing significant risks in safety evaluations that focus solely on English.

Contribution

The paper presents the first judge-free, multilingual benchmark for jailbreak robustness across 12 South Asian languages, highlighting systematic vulnerabilities in current safety assessments.

Findings

01

Contracts increase refusal rates but do not prevent jailbreaks.

02

English-to-Indic attack transfer is strong, with format wrappers often outperforming instruction wrappers.

03

Romanized inputs reduce jailbreak success rates, correlated with tokenization effects.

Abstract

Safety alignment of large language models (LLMs) is mostly evaluated in English and contract-bound, leaving multilingual vulnerabilities understudied. We introduce \textbf{Indic Jailbreak Robustness (IJR)}, a judge-free benchmark for adversarial safety across 12 Indic and South Asian languages (2.1 Billion speakers), covering 45216 prompts in JSON (contract-bound) and Free (naturalistic) tracks. IJR reveals three patterns. (1) Contracts inflate refusals but do not stop jailbreaks: in JSON, LLaMA and Sarvam exceed 0.92 JSR, and in Free all models reach 1.0 with refusals collapsing. (2) English to Indic attacks transfer strongly, with format wrappers often outperforming instruction wrappers. (3) Orthography matters: romanized or mixed inputs reduce JSR under JSON, with correlations to romanization share and tokenization (approx 0.28 to 0.32) indicating systematic effects. Human audits…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

IndicJR: A Judge-Free Benchmark of Jailbreak Robustness in South Asian Languages· underline

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Topic Modeling