HeadRouter: A Training-free Image Editing Framework for MM-DiTs by Adaptively Routing Attention Heads
Yu Xu, Fan Tang, Juan Cao, Yuxin Zhang, Xiaoyu Kong, Jintao Li, Oliver, Deussen, Tong-Yee Lee

TL;DR
HeadRouter is a training-free framework that adaptively routes attention heads in MM-DiTs for accurate, text-guided image editing, improving semantic alignment without additional training.
Contribution
It introduces HeadRouter, a novel method for text-guided image editing in MM-DiTs that does not require retraining and leverages attention head sensitivity.
Findings
Improves editing fidelity and image quality on multiple benchmarks.
Effectively aligns edited images with textual guidance.
Enhances semantic precision through dual-token refinement.
Abstract
Diffusion Transformers (DiTs) have exhibited robust capabilities in image generation tasks. However, accurate text-guided image editing for multimodal DiTs (MM-DiTs) still poses a significant challenge. Unlike UNet-based structures that could utilize self/cross-attention maps for semantic editing, MM-DiTs inherently lack support for explicit and consistent incorporated text guidance, resulting in semantic misalignment between the edited results and texts. In this study, we disclose the sensitivity of different attention heads to different image semantics within MM-DiTs and introduce HeadRouter, a training-free image editing framework that edits the source image by adaptively routing the text guidance to different attention heads in MM-DiTs. Furthermore, we present a dual-token refinement module to refine text/image token representations for precise semantic guidance and accurate region…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications
MethodsSoftmax · Attention Is All You Need
