Are Vision-Language Models Safe in the Wild? A Meme-Based Benchmark Study

DongGeon Lee; Joonwon Jang; Jihae Jeong; Hwanjo Yu

arXiv:2505.15389·cs.CL·September 24, 2025

Are Vision-Language Models Safe in the Wild? A Meme-Based Benchmark Study

DongGeon Lee, Joonwon Jang, Jihae Jeong, Hwanjo Yu

PDF

Open Access 1 Datasets

TL;DR

This paper evaluates the safety of vision-language models when faced with real-world meme images, revealing increased vulnerability to harmful outputs and emphasizing the importance of ecologically valid safety assessments.

Contribution

Introduces MemeSafetyBench, a large benchmark dataset for assessing VLM safety with memes, and provides comprehensive analysis of model vulnerabilities and mitigation strategies.

Findings

01

VLMs are more vulnerable to meme-based harmful prompts than synthetic images.

02

Memes increase harmful responses and reduce safety refusals.

03

Multi-turn interactions only partially mitigate meme-induced risks.

Abstract

Rapid deployment of vision-language models (VLMs) magnifies safety risks, yet most evaluations rely on artificial images. This study asks: How safe are current VLMs when confronted with meme images that ordinary users share? To investigate this question, we introduce MemeSafetyBench, a 50,430-instance benchmark pairing real meme images with both harmful and benign instructions. Using a comprehensive safety taxonomy and LLM-based instruction generation, we assess multiple VLMs across single and multi-turn interactions. We investigate how real-world memes influence harmful outputs, the mitigating effects of conversational context, and the relationship between model scale and safety metrics. Our findings demonstrate that VLMs are more vulnerable to meme-based harmful prompts than to synthetic or typographic images. Memes significantly increase harmful responses and decrease refusals…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

oneonlee/Meme-Safety-Bench
dataset· 72 dl
72 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLanguage and cultural evolution · AI in Service Interactions · Middle East and Rwanda Conflicts