Diffusion LLMs are Natural Adversaries for any LLM
David L\"udke, Tom Wollschl\"ager, Paul Ungermann, Stephan G\"unnemann, Leo Schwinn

TL;DR
This paper presents a novel framework using Diffusion LLMs to efficiently generate adversarial prompts, replacing costly optimization with probabilistic sampling, and demonstrates its effectiveness across various models.
Contribution
The paper introduces a diffusion-based approach for prompt generation that significantly reduces computational costs and improves transferability of adversarial prompts.
Findings
Generated prompts are low-perplexity and diverse.
Few samples are needed to recover high-reward prompts.
Prompts transfer effectively to black-box models.
Abstract
We introduce a novel framework that transforms the resource-intensive (adversarial) prompt optimization problem into an \emph{efficient, amortized inference task}. Our core insight is that pretrained, non-autoregressive generative LLMs, such as Diffusion LLMs, which model the joint distribution over prompt-response pairs, can serve as powerful surrogates for prompt search. This approach enables the direct conditional generation of prompts, effectively replacing costly, per-instance discrete optimization with a small number of parallelizable samples. We provide a probabilistic analysis demonstrating that under mild fidelity assumptions, only a few conditional samples are required to recover high-reward (harmful) prompts. Empirically, we find that the generated prompts are low-perplexity, diverse jailbreaks that exhibit strong transferability to a wide range of black-box target models,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis · Explainable Artificial Intelligence (XAI)
