UniREditBench: A Unified Reasoning-based Image Editing Benchmark

Feng Han; Yibin Wang; Chenglin Li; Zheming Liang; Dianyi Wang; Yang Jiao; Zhipeng Wei; Chao Gong; Cheng Jin; Jingjing Chen; Jiaqi Wang

arXiv:2511.01295·cs.CV·November 25, 2025

UniREditBench: A Unified Reasoning-based Image Editing Benchmark

Feng Han, Yibin Wang, Chenglin Li, Zheming Liang, Dianyi Wang, Yang Jiao, Zhipeng Wei, Chao Gong, Cheng Jin, Jingjing Chen, Jiaqi Wang

PDF

Open Access 2 Models 2 Datasets

TL;DR

UniREditBench is a comprehensive, multimodal benchmark designed to evaluate reasoning-based image editing capabilities across diverse scenarios, addressing limitations of existing benchmarks with new evaluation methods and a large synthetic dataset.

Contribution

The paper introduces UniREditBench, a unified benchmark with multimodal evaluation and a large synthetic dataset, enabling systematic assessment of reasoning in image editing models.

Findings

01

Enhanced evaluation reliability with multimodal dual-reference approach.

02

Significant performance improvements of fine-tuned models on diverse scenarios.

03

Benchmarking reveals strengths and weaknesses of current image editing models.

Abstract

Recent advances in multi-modal generative models have driven substantial improvements in image editing. However, current generative models still struggle with handling diverse and complex image editing tasks that require implicit reasoning, underscoring the need for a comprehensive benchmark to systematically assess their performance across various reasoning scenarios. Existing benchmarks primarily focus on single-object attribute transformation in realistic scenarios, which, while effective, encounter two key challenges: (1) they largely overlook multi-object interactions as well as game-world scenarios that involve human-defined rules, which are common in real-life applications; (2) they only rely on textual references to evaluate the generated images, potentially leading to systematic misjudgments, especially in complex reasoning scenarios. To this end, this work proposes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Ethics and Social Impacts of AI