Multimodal Pragmatic Jailbreak on Text-to-image Models

Tong Liu; Zhixin Lai; Jiawen Wang; Gengyuan Zhang; Shuo Chen; Philip Torr; Vera Demberg; Volker Tresp; Jindong Gu

arXiv:2409.19149·cs.CV·June 12, 2025

Multimodal Pragmatic Jailbreak on Text-to-image Models

Tong Liu, Zhixin Lai, Jiawen Wang, Gengyuan Zhang, Shuo Chen, Philip Torr, Vera Demberg, Volker Tresp, Jindong Gu

PDF

Open Access 1 Datasets

TL;DR

This paper uncovers a new type of jailbreak in text-to-image models where combining safe text and images can produce unsafe content, revealing vulnerabilities in current diffusion models and filters.

Contribution

It introduces a systematic dataset and benchmark for multimodal jailbreaks in T2I models, highlighting their vulnerability and evaluating filter defenses.

Findings

01

All tested models are susceptible to the jailbreak, with unsafe rates up to 70%.

02

Common filters fail to detect or prevent the jailbreak.

03

The jailbreak exploits the models' text rendering and training data biases.

Abstract

Diffusion models have recently achieved remarkable advancements in terms of image quality and fidelity to textual prompts. Concurrently, the safety of such generative models has become an area of growing concern. This work introduces a novel type of jailbreak, which triggers T2I models to generate the image with visual text, where the image and the text, although considered to be safe in isolation, combine to form unsafe content. To systematically explore this phenomenon, we propose a dataset to evaluate the current diffusion-based text-to-image (T2I) models under such jailbreak. We benchmark nine representative T2I models, including two closed-source commercial models. Experimental results reveal a concerning tendency to produce unsafe content: all tested models suffer from such type of jailbreak, with rates of unsafe generation ranging from around 10\% to 70\% where DALLE 3…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

tongliuphysics/multimodalpragmatic
dataset· 89 dl
89 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLaw in Society and Culture