AttnRouter: Per-Category Attention Routing for Training-Free Image Editing on MMDiT

Guandong Li; Mengxia Ye

arXiv:2605.01480·cs.CV·May 5, 2026

AttnRouter: Per-Category Attention Routing for Training-Free Image Editing on MMDiT

Guandong Li, Mengxia Ye

PDF

TL;DR

This paper introduces a training-free image editing method using a multi-modal diffusion transformer, with novel attention manipulation and routing techniques that improve editing fidelity and preserve source structure.

Contribution

It proposes KVInject for simplified attention manipulation, AttnRouter for per-category routing, and localizes effective attention sub-circuits for image editing.

Findings

01

KVInject avoids prompt-mismatch failure and simplifies attention manipulation.

02

AttnRouter improves editing accuracy by dispatching to optimal attention operations.

03

Injection in early denoising steps recovers most editing gains.

Abstract

We study training-free image editing on Qwen-Image-Edit-2511, a 60-block multi-modal diffusion transformer (MMDiT) that concatenates noise and source-image tokens within a single attention stream. We make three contributions. (i) We introduce KVInject, a single-forward attention manipulation that alpha-blends source-half key/value projections into the noise-half within a localized layer/step band. KVInject is simpler than the classical two-pass MasaCtrl recipe and avoids the prompt-mismatch failure mode that disables MasaCtrl on MMDiT (composite score drops 31% versus baseline). (ii) We show that no single attention operation dominates across edit types, motivating AttnRouter, a per-category routing table that dispatches edits to the operation that best preserves source structure for that type. With ground-truth categories the router improves the CLIP-T+DINO-I composite by 6.4% over the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.