MSTS: A Multimodal Safety Test Suite for Vision-Language Models

Paul R\"ottger; Giuseppe Attanasio; Felix Friedrich; Janis Goldzycher,; Alicia Parrish; Rishabh Bhardwaj; Chiara Di Bonaventura; Roman Eng; Gaia El; Khoury Geagea; Sujata Goswami; Jieun Han; Dirk Hovy; Seogyeong Jeong; Paloma; Jereti\v{c}; Flor Miriam Plaza-del-Arco; Donya Rooein; Patrick Schramowski,; Anastassia Shaitarova; Xudong Shen; Richard Willats; Andrea Zugarini; Bertie; Vidgen

arXiv:2501.10057·cs.CL·January 20, 2025

MSTS: A Multimodal Safety Test Suite for Vision-Language Models

Paul R\"ottger, Giuseppe Attanasio, Felix Friedrich, Janis Goldzycher,, Alicia Parrish, Rishabh Bhardwaj, Chiara Di Bonaventura, Roman Eng, Gaia El, Khoury Geagea, Sujata Goswami, Jieun Han, Dirk Hovy, Seogyeong Jeong, Paloma, Jereti\v{c}, Flor Miriam Plaza-del-Arco

PDF

Open Access 1 Repo 2 Datasets

TL;DR

This paper introduces MSTS, a comprehensive multimodal safety test suite for vision-language models, revealing significant safety issues and highlighting the need for improved safety evaluation methods.

Contribution

The paper presents MSTS, a novel multimodal safety testing framework with 400 prompts across 40 hazard categories, and analyzes safety gaps in current VLMs.

Findings

01

Several open VLMs exhibit safety issues.

02

Some models are safe only by failing to understand prompts.

03

Multilingual prompts increase unsafe responses.

Abstract

Vision-language models (VLMs), which process image and text inputs, are increasingly integrated into chat assistants and other consumer AI applications. Without proper safeguards, however, VLMs may give harmful advice (e.g. how to self-harm) or encourage unsafe behaviours (e.g. to consume drugs). Despite these clear hazards, little work so far has evaluated VLM safety and the novel risks created by multimodal inputs. To address this gap, we introduce MSTS, a Multimodal Safety Test Suite for VLMs. MSTS comprises 400 test prompts across 40 fine-grained hazard categories. Each test prompt consists of a text and an image that only in combination reveal their full unsafe meaning. With MSTS, we find clear safety issues in several open VLMs. We also find some VLMs to be safe by accident, meaning that they are safe because they fail to understand even simple test prompts. We translate MSTS into…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

paul-rottger/msts-multimodal-safety
pytorchOfficial

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques