TL;DR
SpecEdit is a training-free, dynamic-resolution framework that accelerates diffusion-based image editing by selectively applying high-resolution denoising only to edit-relevant regions, maintaining quality while significantly reducing computation.
Contribution
It introduces a draft-and-verify scheme for semantic-aware resolution adjustment, enabling up to 13x acceleration when combined with existing methods.
Findings
Achieves up to 10x and 7x acceleration on benchmark datasets.
Maintains strong editing quality despite reduced computation.
Complementary to existing acceleration techniques, boosting overall speedup.
Abstract
Diffusion-based image editing offers strong semantic controllability, but remains computationally expensive due to iterative high-resolution denoising over all spatial tokens. Dynamic-resolution sampling reduces this cost by performing early steps at reduced resolution. However, existing approaches prioritize upsampling using low-level heuristics such as edge detection or channel variance, which are weakly aligned with editing semantics and may lead to structural inconsistency. Moreover, spatial regions are often upsampled without verifying whether semantic modification is actually required, resulting in redundant high-resolution computation and accumulated errors. Therefore, we propose SpecEdit, a training-free dynamic-resolution framework tailored for diffusion-based image editing. SpecEdit follows a draft-and-verify scheme: a low-resolution draft first estimates the semantic outcome,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
