Where and How to Perturb: On the Design of Perturbation Guidance in Diffusion and Flow Models

Donghoon Ahn; Jiwon Kang; Sanghyun Lee; Minjae Kim; Jaewon Min; Wooseok Jang; Sangwu Lee; Sayak Paul; Susung Hong; Seungryong Kim

arXiv:2506.10978·cs.CV·November 4, 2025

Where and How to Perturb: On the Design of Perturbation Guidance in Diffusion and Flow Models

Donghoon Ahn, Jiwon Kang, Sanghyun Lee, Minjae Kim, Jaewon Min, Wooseok Jang, Sangwu Lee, Sayak Paul, Susung Hong, Seungryong Kim

PDF

TL;DR

This paper introduces HeadHunter and SoftPAG, novel methods for attention head perturbation in diffusion models, enabling fine-grained, interpretable control over image generation quality and style attributes.

Contribution

It provides the first head-level analysis of attention perturbation, revealing head specialization and proposing systematic, user-centric perturbation strategies for diffusion models.

Findings

01

Attention heads govern distinct visual concepts such as structure, style, and texture.

02

HeadHunter effectively selects attention heads aligned with user objectives.

03

SoftPAG offers continuous control over perturbation strength, improving generation quality.

Abstract

Recent guidance methods in diffusion models steer reverse sampling by perturbing the model to construct an implicit weak model and guide generation away from it. Among these approaches, attention perturbation has demonstrated strong empirical performance in unconditional scenarios where classifier-free guidance is not applicable. However, existing attention perturbation methods lack principled approaches for determining where perturbations should be applied, particularly in Diffusion Transformer (DiT) architectures where quality-relevant computations are distributed across layers. In this paper, we investigate the granularity of attention perturbations, ranging from the layer level down to individual attention heads, and discover that specific heads govern distinct visual concepts such as structure, style, and texture quality. Building on this insight, we propose "HeadHunter", a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAbsolute Position Encodings · Layer Normalization · Byte Pair Encoding · Label Smoothing · Softmax · Dropout · Dense Connections · Transformer · Attention Is All You Need · Diffusion