JANUS: A Lightweight Framework for Jailbreaking Text-to-Image Models via Distribution Optimization

Haolun Zheng; Yu He; Tailun Chen; Shuo Shao; Zhixuan Chu; Hongbin Zhou; Lan Tao; Zhan Qin; Kui Ren

arXiv:2603.21208·cs.CV·March 27, 2026

JANUS: A Lightweight Framework for Jailbreaking Text-to-Image Models via Distribution Optimization

Haolun Zheng, Yu He, Tailun Chen, Shuo Shao, Zhixuan Chu, Hongbin Zhou, Lan Tao, Zhan Qin, Kui Ren

PDF

Open Access

TL;DR

JANUS is a lightweight, distribution-based framework that effectively jailbreaks text-to-image models by optimizing prompt distributions, revealing vulnerabilities in current safety measures and outperforming existing methods.

Contribution

The paper introduces JANUS, a novel, efficient framework that formulates jailbreak as prompt distribution optimization, avoiding large-scale generators and improving success rates over prior approaches.

Findings

01

JANUS achieves a jailbreak success rate of 43.15% on Stable Diffusion 3.5.

02

JANUS outperforms state-of-the-art methods in success rate and safety filter bypass.

03

The approach exposes structural weaknesses in current T2I safety defenses.

Abstract

Text-to-image (T2I) models such as Stable Diffusion and DALLE remain susceptible to generating harmful or Not-Safe-For-Work (NSFW) content under jailbreak attacks despite deployed safety filters. Existing jailbreak attacks either rely on proxy-loss optimization instead of the true end-to-end objective, or depend on large-scale and costly RL-trained generators. Motivated by these limitations, we propose JANUS , a lightweight framework that formulates jailbreak as optimizing a structured prompt distribution under a black-box, end-to-end reward from the T2I system and its safety filters. JANUS replaces a high-capacity generator with a low-dimensional mixing policy over two semantically anchored prompt distributions, enabling efficient exploration while preserving the target semantics. On modern T2I models, we outperform state-of-the-art jailbreak methods, improving ASR-8 from 25.30% to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Security and Verification in Computing