VSCBench: Bridging the Gap in Vision-Language Model Safety Calibration
Jiahui Geng, Qing Li, Zongxiong Chen, Yuxia Wang, Derui Zhu, Zhuohan Xie, Chenyang Lyu, Xiuying Chen, Preslav Nakov, Fakhri Karray

TL;DR
This paper introduces VSCBench, a comprehensive benchmark dataset for evaluating and improving safety calibration in vision-language models, addressing both undersafety and oversafety issues.
Contribution
It presents VSCBench, a new dataset with 3,600 image-text pairs for assessing safety calibration, and evaluates existing models and methods, highlighting challenges and trade-offs.
Findings
Existing models exhibit significant undersafety and oversafety issues.
Some calibration methods improve safety but reduce model utility.
The benchmark enables systematic evaluation of safety calibration approaches.
Abstract
The rapid advancement of vision-language models (VLMs) has brought a lot of attention to their safety alignment. However, existing methods have primarily focused on model undersafety, where the model responds to hazardous queries, while neglecting oversafety, where the model refuses to answer safe queries. In this paper, we introduce the concept of , which systematically addresses both undersafety and oversafety. Specifically, we present , a novel dataset of 3,600 image-text pairs that are visually or textually similar but differ in terms of safety, which is designed to evaluate safety calibration across image-centric and text-centric scenarios. Based on our benchmark, we evaluate safety calibration across eleven widely used VLMs. Our extensive experiments revealed major issues with both undersafety and oversafety. We further investigated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Processing Techniques · Semantic Web and Ontologies · Fault Detection and Control Systems
MethodsSoftmax · Attention Is All You Need
