Are Vision-Language Models Safe in the Wild? A Meme-Based Benchmark Study
DongGeon Lee, Joonwon Jang, Jihae Jeong, Hwanjo Yu

TL;DR
This paper evaluates the safety of vision-language models when faced with real-world meme images, revealing increased vulnerability to harmful outputs and emphasizing the importance of ecologically valid safety assessments.
Contribution
Introduces MemeSafetyBench, a large benchmark dataset for assessing VLM safety with memes, and provides comprehensive analysis of model vulnerabilities and mitigation strategies.
Findings
VLMs are more vulnerable to meme-based harmful prompts than synthetic images.
Memes increase harmful responses and reduce safety refusals.
Multi-turn interactions only partially mitigate meme-induced risks.
Abstract
Rapid deployment of vision-language models (VLMs) magnifies safety risks, yet most evaluations rely on artificial images. This study asks: How safe are current VLMs when confronted with meme images that ordinary users share? To investigate this question, we introduce MemeSafetyBench, a 50,430-instance benchmark pairing real meme images with both harmful and benign instructions. Using a comprehensive safety taxonomy and LLM-based instruction generation, we assess multiple VLMs across single and multi-turn interactions. We investigate how real-world memes influence harmful outputs, the mitigating effects of conversational context, and the relationship between model scale and safety metrics. Our findings demonstrate that VLMs are more vulnerable to meme-based harmful prompts than to synthetic or typographic images. Memes significantly increase harmful responses and decrease refusals…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage and cultural evolution · AI in Service Interactions · Middle East and Rwanda Conflicts
