Edit-Compass & EditReward-Compass: A Unified Benchmark for Image Editing and Reward Modeling

Xuehai Bai; Yang Shi; Yi-Fan Zhang; Xuanyu Zhu; Yuran Wang; Yifan Dai; Xinyu Liu; Yiyan Ji; Xiaoling Gu; Yuanxing Zhang

arXiv:2605.13062·cs.CV·May 14, 2026

Edit-Compass & EditReward-Compass: A Unified Benchmark for Image Editing and Reward Modeling

Xuehai Bai, Yang Shi, Yi-Fan Zhang, Xuanyu Zhu, Yuran Wang, Yifan Dai, Xinyu Liu, Yiyan Ji, Xiaoling Gu, Yuanxing Zhang

PDF

1 Repo

TL;DR

This paper introduces Edit-Compass and EditReward-Compass, comprehensive benchmarks for evaluating image editing and reward modeling, addressing limitations of existing benchmarks with more challenging tasks and realistic evaluation protocols.

Contribution

The authors present a unified evaluation suite with extensive annotated instances and multidimensional scoring for image editing and reward modeling, improving assessment fidelity.

Findings

01

Contains 2,388 annotated instances across six challenging categories.

02

Includes 2,251 preference pairs for realistic reward modeling evaluation.

03

Employs a fine-grained, structured evaluation framework.

Abstract

Recent image editing models have achieved remarkable progress in instruction following, multimodal understanding, and complex visual editing. However, existing benchmarks often fail to faithfully reflect human judgment, especially for strong frontier models, due to limited task difficulty and coarse-grained evaluation protocols. In parallel, reward models have become increasingly important for RL-based image editing optimization, yet existing reward model benchmarks still rely on unrealistic evaluation settings that deviate from practical RL scenarios. These limitations hinder reliable assessment of both image editing models and reward models. To address these challenges, we introduce Edit-Compass and EditReward-Compass, a unified evaluation suite for image editing and reward modeling. Edit-Compass contains 2,388 carefully annotated instances spanning six progressively challenging task…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bxhsort/Edit-Compass-and-EditReward-Compass
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.