Safe Inputs but Unsafe Output: Benchmarking Cross-modality Safety   Alignment of Large Vision-Language Model

Siyin Wang; Xingsong Ye; Qinyuan Cheng; Junwen Duan; Shimin Li; Jinlan; Fu; Xipeng Qiu; Xuanjing Huang

arXiv:2406.15279·cs.AI·February 18, 2025·1 cites

Safe Inputs but Unsafe Output: Benchmarking Cross-modality Safety Alignment of Large Vision-Language Model

Siyin Wang, Xingsong Ye, Qinyuan Cheng, Junwen Duan, Shimin Li, Jinlan, Fu, Xipeng Qiu, Xuanjing Huang

PDF

Open Access 1 Repo 2 Datasets

TL;DR

This paper introduces the SIUO benchmark to evaluate cross-modality safety alignment in large vision-language models, revealing significant safety vulnerabilities and highlighting the need for improved safety measures in multi-modal AI systems.

Contribution

It presents a novel safety challenge and benchmark for cross-modality safety, addressing a gap in existing safety evaluations for multi-modal AI models.

Findings

01

Substantial safety vulnerabilities found in current LVLMs

02

Current models struggle with complex, real-world safety scenarios

03

Benchmark covers 9 critical safety domains

Abstract

As Artificial General Intelligence (AGI) becomes increasingly integrated into various facets of human life, ensuring the safety and ethical alignment of such systems is paramount. Previous studies primarily focus on single-modality threats, which may not suffice given the integrated and complex nature of cross-modality interactions. We introduce a novel safety alignment challenge called Safe Inputs but Unsafe Output (SIUO) to evaluate cross-modality safety alignment. Specifically, it considers cases where single modalities are safe independently but could potentially lead to unsafe or unethical outputs when combined. To empirically investigate this problem, we developed the SIUO, a cross-modality benchmark encompassing 9 critical safety domains, such as self-harm, illegal activities, and privacy violations. Our findings reveal substantial safety vulnerabilities in both closed- and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sinwang20/siuo
noneOfficial

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSafety Systems Engineering in Autonomy · Risk and Safety Analysis · Software Reliability and Analysis Research

MethodsFocus