TL;DR
This paper introduces GPO-V, a novel jailbreak method for diffusion vision-language models that exploits global probability dynamics to bypass safety guardrails, revealing significant security vulnerabilities.
Contribution
It proposes GPO, a new global probability optimization technique for attacking diffusion models, and introduces GPO-V, the first visual-modality jailbreak framework for dVLMs.
Findings
GPO-V effectively creates stealthy, transferable perturbations.
Current defense strategies are insufficient against GPO-based attacks.
GPO-V exposes critical security gaps in non-sequential diffusion models.
Abstract
Diffusion Vision-Language Models (dVLMs), built upon the non-causal foundations of Diffusion Large Language Models (dLLMs), have demonstrated remarkable efficacy in multimodal tasks by departing from the traditional autoregressive generation paradigm. While dVLMs appear inherently robust against conventional jailbreak tactics, which we categorize as Fixed Prefix Optimization (FPO) (e.g., anchoring responses with "Sure, here is"), this perceived resilience is deceptive. Our investigation into the safety landscape of dVLMs reveals a unique refusal pattern: Immediate Refusal and Progressive Refusal. We find that while FPO-based attacks often fail by triggering the latter, the progressive refinement process itself uncovers a novel, latent attack surface. To exploit this vulnerability, we propose Global Probability Optimization (GPO), a general jailbreak paradigm designed specifically for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
