Omni IIE Bench: Benchmarking the Practical Capabilities of Image Editing Models

Yujia Yang; Yuanxiang Wang; Zhenyu Guan; Tiankun Yang; Chenxi Bao; Haopeng Jin; Jinwen Luo; Xinyu Zuo; Lisheng Duan; Haijin Liang; Jin Ma; Xinming Wang; Ruiwen Tao; Hongzhu Yi

arXiv:2603.16944·cs.CV·March 19, 2026

Omni IIE Bench: Benchmarking the Practical Capabilities of Image Editing Models

Yujia Yang, Yuanxiang Wang, Zhenyu Guan, Tiankun Yang, Chenxi Bao, Haopeng Jin, Jinwen Luo, Xinyu Zuo, Lisheng Duan, Haijin Liang, Jin Ma, Xinming Wang, Ruiwen Tao, Hongzhu Yi

PDF

Open Access

TL;DR

Omni IIE Bench is a new benchmark designed to evaluate the consistency of image editing models across tasks of different semantic complexities, revealing significant performance gaps in current models.

Contribution

It introduces a dual-track diagnostic benchmark with rigorous human filtering to assess and diagnose the performance of IIE models across semantic scales.

Findings

01

Most models perform worse on high-semantic-scale tasks.

02

The benchmark reveals a significant performance gap across models.

03

Provides diagnostic tools for improving IIE model reliability.

Abstract

While Instruction-based Image Editing (IIE) has achieved significant progress, existing benchmarks pursue task breadth via mixed evaluations. This paradigm obscures a critical failure mode crucial in professional applications: the inconsistent performance of models across tasks of varying semantic scales. To address this gap, we introduce Omni IIE Bench, a high-quality, human-annotated benchmark specifically designed to diagnose the editing consistency of IIE models in practical application scenarios. Omni IIE Bench features an innovative dual-track diagnostic design: (1) Single-turn Consistency, comprising shared-context task pairs of attribute modification and entity replacement; and (2) Multi-turn Coordination, involving continuous dialogue tasks that traverse semantic scales. The benchmark is constructed via an exceptionally rigorous multi-stage human filtering process,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning