TL;DR
HierEdit introduces a region-aware hierarchical diffusion framework that enables efficient, high-fidelity high-resolution image editing up to 4K by focusing on edited regions and reusing unaltered areas.
Contribution
The paper presents HierEdit, a novel hierarchical diffusion method that accelerates high-resolution editing without requiring high-res training data.
Findings
Achieves high-quality 4K image editing with fast inference.
Significantly reduces computational costs compared to existing methods.
Maintains consistent global semantics during editing.
Abstract
High-resolution image editing is essential for professional and creative applications, yet existing multimodal diffusion-based editors remain computationally inefficient and constrained to relatively low resolutions. Current approaches redundantly process the entire image canvas or rely on large-scale high-resolution datasets, resulting in substantial training and inference costs. We introduce HierEdit, a region-aware hierarchical diffusion framework designed for efficient and scalable high-resolution image editing. Our method first performs edits on a low-resolution proxy using an off-the-shelf editing model to generate a reference and to localize the modified regions. A hierarchical local-window diffusion model (\textbf{Local-Window MMDiT}) that refines only edited regions within the original high-res image, while reusing the unaltered regions as conditioning inputs. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
