Universally Unfiltered and Unseen:Input-Agnostic Multimodal Jailbreaks against Text-to-Image Model Safeguards

Song Yan; Hui Wei; Jinlong Fei; Guoliang Yang; Zhengyu Zhao; Zheng Wang

arXiv:2508.05658·cs.CR·August 12, 2025

Universally Unfiltered and Unseen:Input-Agnostic Multimodal Jailbreaks against Text-to-Image Model Safeguards

Song Yan, Hui Wei, Jinlong Fei, Guoliang Yang, Zhengyu Zhao, Zheng Wang

PDF

Open Access

TL;DR

This paper introduces U3-Attack, a scalable multimodal jailbreak method that effectively bypasses T2I safeguards using universal adversarial patches and paraphrases, outperforming existing attacks.

Contribution

The paper presents a novel universal multimodal jailbreak approach that improves scalability and success rates over prior prompt-specific methods.

Findings

01

U3-Attack achieves ~4x higher success rates than previous methods.

02

It effectively bypasses both prompt filters and safety checkers.

03

Demonstrated on multiple open-source and commercial T2I models.

Abstract

Various (text) prompt filters and (image) safety checkers have been implemented to mitigate the misuse of Text-to-Image (T2I) models in creating Not-Safe-For-Work (NSFW) content. In order to expose potential security vulnerabilities of such safeguards, multimodal jailbreaks have been studied. However, existing jailbreaks are limited to prompt-specific and image-specific perturbations, which suffer from poor scalability and time-consuming optimization. To address these limitations, we propose Universally Unfiltered and Unseen (U3)-Attack, a multimodal jailbreak attack method against T2I safeguards. Specifically, U3-Attack optimizes an adversarial patch on the image background to universally bypass safety checkers and optimizes a safe paraphrase set from a sensitive word to universally bypass prompt filters while eliminating redundant computations. Extensive experimental results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis · Digital Media Forensic Detection