Structured Visual Narratives Undermine Safety Alignment in Multimodal Large Language Models
Rui Yang Tan, Yujia Hu, Roy Ka-Wei Lee

TL;DR
This paper demonstrates that multimodal large language models are vulnerable to visual narrative-based jailbreaks, exposing safety risks and highlighting the need for more robust safety alignment methods.
Contribution
Introduces ComicJailbreak, a new benchmark for visual narrative jailbreaks, and evaluates the effectiveness of current safety defenses against these multimodal attacks.
Findings
ComicJailbreak contains 1,167 attack instances across 10 harm categories.
State-of-the-art models show over 90% success rate in visual jailbreak attacks.
Current safety evaluators are unreliable on sensitive but non-harmful content.
Abstract
Multimodal Large Language Models (MLLMs) extend text-only LLMs with visual reasoning, but also introduce new safety failure modes under visually grounded instructions. We study comic-template jailbreaks that embed harmful goals inside simple three-panel visual narratives and prompt the model to role-play and "complete the comic." Building on JailbreakBench and JailbreakV, we introduce ComicJailbreak, a comic-based jailbreak benchmark with 1,167 attack instances spanning 10 harm categories and 5 task setups. Across 15 state-of-the-art MLLMs (six commercial and nine open-source), comic-based attacks achieve success rates comparable to strong rule-based jailbreaks and substantially outperform plain-text and random-image baselines, with ensemble success rates exceeding 90% on several commercial models. Then, with the existing defense methodologies, we show that these methods are effective…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
