Lost in Localization: Building RabakBench with Human-in-the-Loop Validation to Measure Multilingual Safety Gaps

Gabriel Chua; Leanne Tan; Ziyu Ge; Roy Ka-Wei Lee

arXiv:2507.05980·cs.CL·February 3, 2026

Lost in Localization: Building RabakBench with Human-in-the-Loop Validation to Measure Multilingual Safety Gaps

Gabriel Chua, Leanne Tan, Ziyu Ge, Roy Ka-Wei Lee

PDF

Open Access 1 Repo 2 Datasets

TL;DR

This paper introduces RabakBench, a multilingual safety benchmark for low-resource languages, created through a human-in-the-loop pipeline, revealing significant safety gaps in current LLM guardrails for diverse linguistic communities.

Contribution

We develop RabakBench, a scalable, human-validated safety benchmark tailored to Singapore's multilingual landscape, addressing safety gaps in low-resource language varieties.

Findings

01

State-of-the-art guardrails perform poorly on RabakBench.

02

The benchmark contains over 5,000 examples across six safety categories.

03

Human oversight ensures high annotation agreement despite LLM scalability.

Abstract

Large language models (LLMs) often fail to maintain safety in low-resource language varieties, such as code-mixed vernaculars and regional dialects. We introduce RabakBench, a multilingual safety benchmark and scalable pipeline localized to Singapore's unique linguistic landscape, covering Singlish, Chinese, Malay, and Tamil. We construct the benchmark through a three-stage pipeline: (1) Generate: augmenting real-world unsafe web content via LLM-driven red teaming; (2) Label: applying semi-automated multi-label annotation using majority-voted LLM labelers; and (3) Translate: performing high-fidelity, toxicity-preserving translation. The resulting dataset contains over 5,000 examples across six fine-grained safety categories. Despite using LLMs for scalability, our framework maintains rigorous human oversight, achieving 0.70-0.80 inter-annotator agreement. Evaluations of 13…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

govtech-responsibleai/rabakbench
noneOfficial

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Hate Speech and Cyberbullying Detection