Diffusion LLMs are Natural Adversaries for any LLM

David L\"udke; Tom Wollschl\"ager; Paul Ungermann; Stephan G\"unnemann; Leo Schwinn

arXiv:2511.00203·cs.LG·November 4, 2025

Diffusion LLMs are Natural Adversaries for any LLM

David L\"udke, Tom Wollschl\"ager, Paul Ungermann, Stephan G\"unnemann, Leo Schwinn

PDF

Open Access 3 Models

TL;DR

This paper presents a novel framework using Diffusion LLMs to efficiently generate adversarial prompts, replacing costly optimization with probabilistic sampling, and demonstrates its effectiveness across various models.

Contribution

The paper introduces a diffusion-based approach for prompt generation that significantly reduces computational costs and improves transferability of adversarial prompts.

Findings

01

Generated prompts are low-perplexity and diverse.

02

Few samples are needed to recover high-reward prompts.

03

Prompts transfer effectively to black-box models.

Abstract

We introduce a novel framework that transforms the resource-intensive (adversarial) prompt optimization problem into an \emph{efficient, amortized inference task}. Our core insight is that pretrained, non-autoregressive generative LLMs, such as Diffusion LLMs, which model the joint distribution over prompt-response pairs, can serve as powerful surrogates for prompt search. This approach enables the direct conditional generation of prompts, effectively replacing costly, per-instance discrete optimization with a small number of parallelizable samples. We provide a probabilistic analysis demonstrating that under mild fidelity assumptions, only a few conditional samples are required to recover high-reward (harmful) prompts. Empirically, we find that the generated prompts are low-perplexity, diverse jailbreaks that exhibit strong transferability to a wide range of black-box target models,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis · Explainable Artificial Intelligence (XAI)