SpatialReward: Bridging the Perception Gap in Online RL for Image Editing via Explicit Spatial Reasoning

Yancheng Long; Yankai Yang; Hongyang Wei; Wei Chen; Tianke Zhang; Haonan fan; Changyi Liu; Kaiyu Jiang; Jiankang Chen; Kaiyu Tang; Bin Wen; Fan Yang; Tingting Gao; Han Li; Shuo Yang

arXiv:2602.07458·cs.CV·May 14, 2026

SpatialReward: Bridging the Perception Gap in Online RL for Image Editing via Explicit Spatial Reasoning

Yancheng Long, Yankai Yang, Hongyang Wei, Wei Chen, Tianke Zhang, Haonan fan, Changyi Liu, Kaiyu Jiang, Jiankang Chen, Kaiyu Tang, Bin Wen, Fan Yang, Tingting Gao, Han Li, Shuo Yang

PDF

1 Models 1 Datasets

TL;DR

SpatialReward introduces an explicit spatial reasoning reward model that improves evaluation accuracy and enhances online RL performance in complex image editing tasks by addressing the perception gap.

Contribution

The paper presents SpatialReward, a novel reward model leveraging spatial reasoning to improve evaluation and RL in image editing, trained on a large spatial-aware dataset.

Findings

01

SpatialReward achieves state-of-the-art results on multiple benchmarks.

02

It outperforms proprietary evaluators and enhances RL agent performance.

03

Spatial reasoning is shown to be crucial for effective image editing alignment.

Abstract

Online Reinforcement Learning (RL) offers a promising avenue for complex image editing but is currently constrained by the scarcity of reliable and fine-grained reward signals. Existing evaluators frequently struggle with a critical perception gap we term "Attention Collapse," where models neglect cross-image comparisons and fail to capture fine-grained details, resulting in inaccurate perception and miscalibrated scores. To address these limitations, we propose SpatialReward, a reward model that enforces precise verification via explicit spatial reasoning. By anchoring reasoning to predicted edit regions, SpatialReward grounds semantic judgments in pixel-level evidence, significantly enhancing evaluative accuracy. Trained on a curated 260k spatial-aware dataset, our model achieves state-of-the-art performance on MMRB2 and EditReward-Bench, and outperforms proprietary evaluators on our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
SpatialReward/SpatialReward-8B
model· 46 dl· ♡ 1
46 dl♡ 1

Datasets

SpatialReward/SpatialReward-Train
dataset· 605 dl
605 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.