OmniSafeBench-MM: A Unified Benchmark and Toolbox for Multimodal Jailbreak Attack-Defense Evaluation

Xiaojun Jia; Jie Liao; Qi Guo; Teng Ma; Simeng Qin; Ranjie Duan; Tianlin Li; Yihao Huang; Zhitao Zeng; Dongxian Wu; Yiming Li; Wenqi Ren; Xiaochun Cao; Yang Liu

arXiv:2512.06589·cs.CR·December 9, 2025

OmniSafeBench-MM: A Unified Benchmark and Toolbox for Multimodal Jailbreak Attack-Defense Evaluation

Xiaojun Jia, Jie Liao, Qi Guo, Teng Ma, Simeng Qin, Ranjie Duan, Tianlin Li, Yihao Huang, Zhitao Zeng, Dongxian Wu, Yiming Li, Wenqi Ren, Xiaochun Cao, Yang Liu

PDF

Open Access

TL;DR

OmniSafeBench-MM is a comprehensive, standardized benchmark and toolbox designed to evaluate the vulnerability and defense strategies of multi-modal large language models against jailbreak attacks across diverse scenarios.

Contribution

It introduces a unified, reproducible platform with extensive attack methods, defense strategies, and evaluation protocols for multi-modal jailbreak safety assessment.

Findings

01

Revealed vulnerabilities of multiple MLLMs to jailbreak attacks.

02

Provided a standardized evaluation protocol for multi-modal safety.

03

Enabled comparison of attack and defense effectiveness across models.

Abstract

Recent advances in multi-modal large language models (MLLMs) have enabled unified perception-reasoning capabilities, yet these systems remain highly vulnerable to jailbreak attacks that bypass safety alignment and induce harmful behaviors. Existing benchmarks such as JailBreakV-28K, MM-SafetyBench, and HADES provide valuable insights into multi-modal vulnerabilities, but they typically focus on limited attack scenarios, lack standardized defense evaluation, and offer no unified, reproducible toolbox. To address these gaps, we introduce OmniSafeBench-MM, which is a comprehensive toolbox for multi-modal jailbreak attack-defense evaluation. OmniSafeBench-MM integrates 13 representative attack methods, 15 defense strategies, and a diverse dataset spanning 9 major risk domains and 50 fine-grained categories, structured across consultative, imperative, and declarative inquiry types to reflect…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Information and Cyber Security