ML-Bench&Guard: Policy-Grounded Multilingual Safety Benchmark and Guardrail for Large Language Models
Yunhan Zhao, Zhaorun Chen, Xingjun Ma, Yu-Gang Jiang, Bo Li

TL;DR
This paper introduces ML-Bench, a multilingual safety benchmark based on regional regulations, and ML-Guard, a guardrail model that supports policy-conditioned safety assessment across 14 languages.
Contribution
It presents a new policy-grounded multilingual safety benchmark and a diffusion-based guardrail model that outperforms existing methods in safety evaluation.
Findings
ML-Guard outperforms 11 strong baselines across 6 benchmarks.
ML-Bench enables culturally and legally aligned safety evaluation.
Two ML-Guard variants support fast safety checks and detailed compliance assessment.
Abstract
As Large Language Models (LLMs) are increasingly deployed in cross-linguistic contexts, ensuring safety in diverse regulatory and cultural environments has become a critical challenge. However, existing multilingual benchmarks largely rely on general risk taxonomies and machine translation, which confines guardrail models to these predefined categories and hinders their ability to align with region-specific regulations and cultural nuances. To bridge these gaps, we introduce ML-Bench, a policy-grounded multilingual safety benchmark covering 14 languages. ML-Bench is constructed directly from regional regulations, where risk categories and fine-grained rules derived from jurisdiction-specific legal texts are directly used to guide the generation of multilingual safety data, enabling culturally and legally aligned evaluation across languages. Building on ML-Bench, we develop ML-Guard, a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
