SpotEdit: Selective Region Editing in Diffusion Transformers
Zhibin Qin, Zhenxiong Tan, Zeqing Wang, Songhua Liu, Xinchao Wang

TL;DR
SpotEdit introduces a training-free diffusion editing framework that selectively updates only modified regions, reducing computation and preserving unaltered areas for efficient and high-quality image editing.
Contribution
It proposes a novel selective editing method using SpotSelector and SpotFusion, eliminating the need for retraining and improving editing efficiency and fidelity.
Findings
Reduces unnecessary computation during image editing.
Maintains high fidelity in unmodified regions.
Achieves efficient and precise editing without retraining.
Abstract
Diffusion Transformer models have significantly advanced image editing by encoding conditional images and integrating them into transformer layers. However, most edits involve modifying only small regions, while current methods uniformly process and denoise all tokens at every timestep, causing redundant computation and potentially degrading unchanged areas. This raises a fundamental question: Is it truly necessary to regenerate every region during editing? To address this, we propose SpotEdit, a training-free diffusion editing framework that selectively updates only the modified regions. SpotEdit comprises two key components: SpotSelector identifies stable regions via perceptual similarity and skips their computation by reusing conditional image features; SpotFusion adaptively blends these features with edited tokens through a dynamic fusion mechanism, preserving contextual coherence…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Cell Image Analysis Techniques · Advanced Image Fusion Techniques
