Asking like Socrates: Socrates helps VLMs understand remote sensing images

Run Shao; Ziyu Li; Zhaoyang Zhang; Linrui Xu; Xinran He; Hongyuan Yuan; Bolei He; Yongxing Dai; Yiming Yan; Yijun Chen; Wang Guo; Haifeng Li

arXiv:2511.22396·cs.CV·April 9, 2026

Asking like Socrates: Socrates helps VLMs understand remote sensing images

Run Shao, Ziyu Li, Zhaoyang Zhang, Linrui Xu, Xinran He, Hongyuan Yuan, Bolei He, Yongxing Dai, Yiming Yan, Yijun Chen, Wang Guo, Haifeng Li

PDF

2 Repos 1 Models 1 Datasets

TL;DR

This paper introduces RS-EoT, an iterative, evidence-seeking reasoning paradigm for remote sensing vision-language tasks, addressing pseudo reasoning caused by the Glance Effect and achieving state-of-the-art results.

Contribution

It proposes a novel SocraticAgent system with a two-stage reinforcement learning strategy to improve genuine evidence-based reasoning in remote sensing models.

Findings

01

RS-EoT achieves state-of-the-art performance on multiple benchmarks.

02

The approach mitigates the Glance Effect, enabling more accurate reasoning.

03

Iterative reasoning cycles are confirmed through analysis.

Abstract

Recent multimodal reasoning models, inspired by DeepSeek-R1, have significantly advanced vision-language systems. However, in remote sensing (RS) tasks, we observe widespread pseudo reasoning: models narrate the process of reasoning rather than genuinely reason toward the correct answer based on visual evidence. We attribute this to the Glance Effect, where a single, coarse perception of large-scale RS imagery results in incomplete understanding and reasoning based on linguistic self-consistency instead of visual evidence. To address this, we propose RS-EoT (Remote Sensing Evidence-of-Thought), a language-driven, iterative visual evidence-seeking paradigm. To instill this paradigm, we propose SocraticAgent, a self-play multi-agent system that synthesizes reasoning traces via alternating cycles of reasoning and visual inspection. To enhance and generalize these patterns, we propose a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
ShaoRun/RS-EoT-7B
model· 45 dl· ♡ 5
45 dl♡ 5

Datasets

ShaoRun/RS-EoT-4K
dataset· 80 dl
80 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.