FineEdit: Fine-Grained Image Edit with Bounding Box Guidance
Haohang Xu, Lin Liu, Zhibo Zhang, Rong Cong, Xiaopeng Zhang, Qi Tian

TL;DR
FineEdit introduces a bounding box-guided diffusion model for precise image editing, improving localization and background consistency over language-based methods.
Contribution
The paper proposes a multi-level bounding box injection technique and provides a large-scale dataset and benchmark for region-specific image editing.
Findings
Outperforms state-of-the-art models in instruction compliance.
Maintains background consistency effectively.
Demonstrates strong generalization on open benchmarks.
Abstract
Diffusion-based image editing models have achieved significant progress in real world applications. However, conventional models typically rely on natural language prompts, which often lack the precision required to localize target objects. Consequently, these models struggle to maintain background consistency due to their global image regeneration paradigm. Recognizing that visual cues provide an intuitive means for users to highlight specific areas of interest, we utilize bounding boxes as guidance to explicitly define the editing target. This approach ensures that the diffusion model can accurately localize the target while preserving background consistency. To achieve this, we propose FineEdit, a multi-level bounding box injection method that enables the model to utilize spatial conditions more effectively. To support this high precision guidance, we present FineEdit-1.2M, a large…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
