RS-HyRe-R1: A Hybrid Reward Mechanism to Overcome Perceptual Inertia for Remote Sensing Images Understanding

Gaozhi Zhou; Hu He; Peng Shen; Jipeng Zhang; Liujue Zhang; Linrui Xu; Zeyuan Wang; Ziyu Li; Xuezhi Cui; Wang Guo; Haifeng Li

arXiv:2604.17504·cs.CV·April 21, 2026

RS-HyRe-R1: A Hybrid Reward Mechanism to Overcome Perceptual Inertia for Remote Sensing Images Understanding

Gaozhi Zhou, Hu He, Peng Shen, Jipeng Zhang, Liujue Zhang, Linrui Xu, Zeyuan Wang, Ziyu Li, Xuezhi Cui, Wang Guo, Haifeng Li

PDF

1 Repo

TL;DR

This paper introduces RS-HyRe-R1, a hybrid reward framework for remote sensing image understanding that mitigates perceptual inertia, enhances reasoning depth, and achieves state-of-the-art results on multiple vision-language tasks.

Contribution

It proposes a novel hybrid reward mechanism to address perceptual inertia in remote sensing vision-language models, improving reasoning and generalization.

Findings

01

Outperforms models up to 7B parameters on REC, OVD, and VQA tasks.

02

Achieves state-of-the-art performance with only 3B parameters.

03

Demonstrates strong zero-shot generalization, surpassing competitors.

Abstract

Reinforcement learning (RL) post-training substantially improves remote sensing vision-language models (RS-VLMs). However, when handling complex remote sensing imagery (RSI) requiring exhaustive visual scanning, models tend to rely on localized salient cues for rapid inference. We term this RL-induced bias "perceptual inertia". Driven by reward maximization, models favor quick outcome fitting, leading to two limitations: cognitively, overreliance on specific features impedes complete evidence construction; operationally, models struggle to flexibly shift visual focus across tasks. To address this bias and encourage comprehensive visual evidence mining, we propose RS-HyRe-R1, a hybrid reward framework for RSI understanding. It introduces: (1) a spatial reasoning activation reward that enforces structured visual reasoning; (2) a perception correctness reward that provides adaptive quality…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

geox-lab/RS-HyRe-R1
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.