FineEdit: Fine-Grained Image Edit with Bounding Box Guidance

Haohang Xu; Lin Liu; Zhibo Zhang; Rong Cong; Xiaopeng Zhang; Qi Tian

arXiv:2604.10954·cs.CV·April 14, 2026

FineEdit: Fine-Grained Image Edit with Bounding Box Guidance

Haohang Xu, Lin Liu, Zhibo Zhang, Rong Cong, Xiaopeng Zhang, Qi Tian

PDF

TL;DR

FineEdit introduces a bounding box-guided diffusion model for precise image editing, improving localization and background consistency over language-based methods.

Contribution

The paper proposes a multi-level bounding box injection technique and provides a large-scale dataset and benchmark for region-specific image editing.

Findings

01

Outperforms state-of-the-art models in instruction compliance.

02

Maintains background consistency effectively.

03

Demonstrates strong generalization on open benchmarks.

Abstract

Diffusion-based image editing models have achieved significant progress in real world applications. However, conventional models typically rely on natural language prompts, which often lack the precision required to localize target objects. Consequently, these models struggle to maintain background consistency due to their global image regeneration paradigm. Recognizing that visual cues provide an intuitive means for users to highlight specific areas of interest, we utilize bounding boxes as guidance to explicitly define the editing target. This approach ensures that the diffusion model can accurately localize the target while preserving background consistency. To achieve this, we propose FineEdit, a multi-level bounding box injection method that enables the model to utilize spatial conditions more effectively. To support this high precision guidance, we present FineEdit-1.2M, a large…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.