MOSSBench: Is Your Multimodal Language Model Oversensitive to Safe Queries?
Xirui Li, Hengguang Zhou, Ruochen Wang, Tianyi Zhou, Minhao Cheng,, Cho-Jui Hsieh

TL;DR
This paper reveals that advanced multimodal language models often reject harmless queries due to oversensitivity triggered by specific visual stimuli, highlighting a need for improved safety mechanisms.
Contribution
It introduces MOSSBench, a benchmark with 300 benign multimodal queries, to systematically evaluate oversensitivity in 20 state-of-the-art MLLMs, revealing prevalent overcaution issues.
Findings
Oversensitivity is common, with refusal rates up to 76%.
Safer models tend to be more oversensitive.
Different stimuli cause errors at perception, reasoning, and safety judgment stages.
Abstract
Humans are prone to cognitive distortions -- biased thinking patterns that lead to exaggerated responses to specific stimuli, albeit in very different contexts. This paper demonstrates that advanced Multimodal Large Language Models (MLLMs) exhibit similar tendencies. While these models are designed to respond queries under safety mechanism, they sometimes reject harmless queries in the presence of certain visual stimuli, disregarding the benign nature of their contexts. As the initial step in investigating this behavior, we identify three types of stimuli that trigger the oversensitivity of existing MLLMs: Exaggerated Risk, Negated Harm, and Counterintuitive Interpretation. To systematically evaluate MLLMs' oversensitivity to these stimuli, we propose the Multimodal OverSenSitivity Benchmark (MOSSBench). This toolkit consists of 300 manually collected benign multimodal queries,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · Topic Modeling
