Can't See the Forest for the Trees: Benchmarking Multimodal Safety Awareness for Multimodal LLMs

Wenxuan Wang; Xiaoyuan Liu; Kuiyi Gao; Jen-tse Huang; Youliang Yuan; Pinjia He; Shuai Wang; Zhaopeng Tu

arXiv:2502.11184·cs.CL·June 4, 2025

Can't See the Forest for the Trees: Benchmarking Multimodal Safety Awareness for Multimodal LLMs

Wenxuan Wang, Xiaoyuan Liu, Kuiyi Gao, Jen-tse Huang, Youliang Yuan, Pinjia He, Shuai Wang, Zhaopeng Tu

PDF

Open Access

TL;DR

This paper introduces MMSafeAware, a benchmark for evaluating multimodal safety awareness in large language models, revealing current models' safety limitations and the challenges in improving their safety capabilities.

Contribution

The paper presents the first comprehensive multimodal safety awareness benchmark and evaluates existing models, highlighting significant safety challenges and testing methods to improve safety awareness.

Findings

01

Current MLLMs often misclassify unsafe content as safe.

02

Models tend to be overly sensitive, mislabeling benign content as unsafe.

03

Proposed safety improvement methods did not achieve satisfactory results.

Abstract

Multimodal Large Language Models (MLLMs) have expanded the capabilities of traditional language models by enabling interaction through both text and images. However, ensuring the safety of these models remains a significant challenge, particularly in accurately identifying whether multimodal content is safe or unsafe-a capability we term safety awareness. In this paper, we introduce MMSafeAware, the first comprehensive multimodal safety awareness benchmark designed to evaluate MLLMs across 29 safety scenarios with 1500 carefully curated image-prompt pairs. MMSafeAware includes both unsafe and over-safety subsets to assess models abilities to correctly identify unsafe content and avoid over-sensitivity that can hinder helpfulness. Evaluating nine widely used MLLMs using MMSafeAware reveals that current models are not sufficiently safe and often overly sensitive; for example, GPT-4V…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Natural Language Processing Techniques · Semantic Web and Ontologies