GEditBench v2: A Human-Aligned Benchmark for General Image Editing

Zhangqi Jiang; Zheng Sun; Xianfang Zeng; Yufeng Yang; Xuanyang Zhang; Yongliang Wu; Wei Cheng; Gang Yu; Xu Yang; Bihan Wen

arXiv:2603.28547·cs.CV·March 31, 2026

GEditBench v2: A Human-Aligned Benchmark for General Image Editing

Zhangqi Jiang, Zheng Sun, Xianfang Zeng, Yufeng Yang, Xuanyang Zhang, Yongliang Wu, Wei Cheng, Gang Yu, Xu Yang, Bihan Wen

PDF

1 Repo 1 Models 3 Datasets

TL;DR

GEditBench v2 is a comprehensive, human-aligned benchmark for evaluating general image editing models across diverse tasks and out-of-distribution instructions, featuring a new visual consistency assessment model.

Contribution

It introduces GEditBench v2 with 1,200 real-world queries, a novel PVC-Judge for visual consistency, and benchmarks 16 editing models to reveal current limitations.

Findings

01

PVC-Judge achieves state-of-the-art performance and surpasses GPT-5.1.

02

GEditBench v2 covers 23 tasks, including out-of-distribution instructions.

03

Benchmarking reveals critical limitations of current image editing models.

Abstract

Recent advances in image editing have enabled models to handle complex instructions with impressive realism. However, existing evaluation frameworks lag behind: current benchmarks suffer from narrow task coverage, while standard metrics fail to adequately capture visual consistency, i.e., the preservation of identity, structure and semantic coherence between edited and original images. To address these limitations, we introduce GEditBench v2, a comprehensive benchmark with 1,200 real-world user queries spanning 23 tasks, including a dedicated open-set category for unconstrained, out-of-distribution editing instructions beyond predefined tasks. Furthermore, we propose PVC-Judge, an open-source pairwise assessment model for visual consistency, trained via two novel region-decoupled preference data synthesis pipelines. Besides, we construct VCReward-Bench using expert-annotated preference…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhangqijiang07/GEditBench_v2
github

Models

🤗
GEditBench-v2/PVC-Judge
model· 18 dl· ♡ 2
18 dl♡ 2

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.