DLEBench: Evaluating Small-scale Object Editing Ability for Instruction-based Image Editing Model

Shibo Hong; Boxian Ai; Jun Kuang; Wei Wang; FengJiao Chen; Zhongyuan Peng; Chenhao Huang; Yixin Cao

arXiv:2602.23622·cs.CV·May 20, 2026

DLEBench: Evaluating Small-scale Object Editing Ability for Instruction-based Image Editing Model

Shibo Hong, Boxian Ai, Jun Kuang, Wei Wang, FengJiao Chen, Zhongyuan Peng, Chenhao Huang, Yixin Cao

PDF

1 Datasets

TL;DR

DLEBench is a new benchmark designed to evaluate instruction-based image editing models' ability to precisely edit small objects, revealing significant performance gaps and guiding future improvements.

Contribution

The paper introduces DLEBench, the first dedicated benchmark for small-scale object editing in instruction-based image editing models, with a comprehensive evaluation protocol and diverse challenging samples.

Findings

01

Empirical results show significant performance gaps in current models' small-object editing abilities.

02

The benchmark includes 1889 samples with complex scenarios like occlusion and multi-object editing.

03

A dual-mode evaluation framework addresses the misalignment between automated and human judgments.

Abstract

Significant progress has been made in the field of Instruction-based Image Editing Models (IIEMs). However, while these models demonstrate plausible adherence to instructions and strong reasoning ability on current benchmarks, their ability to edit small objects remains underexplored, despite its importance for precise local editing and refining details in both real and generated images. In this paper, we introduce DeepLookEditBench (DLEBench), the first benchmark dedicated to assessing the abilities of IIEMs in editing small-scale objects. Specifically, we construct a challenging testbed comprising 1889 samples across seven instruction types. In these samples, target objects occupy only 1%-10% of the image area, covering complex scenarios such as partial occlusion and multi-object editing. To ensure robust evaluation on this benchmark, we propose an evaluation protocol with refined…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

SPUH/DLEBench
dataset· 493 dl
493 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Generative Adversarial Networks and Image Synthesis