MSTS: A Multimodal Safety Test Suite for Vision-Language Models
Paul R\"ottger, Giuseppe Attanasio, Felix Friedrich, Janis Goldzycher,, Alicia Parrish, Rishabh Bhardwaj, Chiara Di Bonaventura, Roman Eng, Gaia El, Khoury Geagea, Sujata Goswami, Jieun Han, Dirk Hovy, Seogyeong Jeong, Paloma, Jereti\v{c}, Flor Miriam Plaza-del-Arco

TL;DR
This paper introduces MSTS, a comprehensive multimodal safety test suite for vision-language models, revealing significant safety issues and highlighting the need for improved safety evaluation methods.
Contribution
The paper presents MSTS, a novel multimodal safety testing framework with 400 prompts across 40 hazard categories, and analyzes safety gaps in current VLMs.
Findings
Several open VLMs exhibit safety issues.
Some models are safe only by failing to understand prompts.
Multilingual prompts increase unsafe responses.
Abstract
Vision-language models (VLMs), which process image and text inputs, are increasingly integrated into chat assistants and other consumer AI applications. Without proper safeguards, however, VLMs may give harmful advice (e.g. how to self-harm) or encourage unsafe behaviours (e.g. to consume drugs). Despite these clear hazards, little work so far has evaluated VLM safety and the novel risks created by multimodal inputs. To address this gap, we introduce MSTS, a Multimodal Safety Test Suite for VLMs. MSTS comprises 400 test prompts across 40 fine-grained hazard categories. Each test prompt consists of a text and an image that only in combination reveal their full unsafe meaning. With MSTS, we find clear safety issues in several open VLMs. We also find some VLMs to be safe by accident, meaning that they are safe because they fail to understand even simple test prompts. We translate MSTS into…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
