GeoSolver: Scaling Test-Time Reasoning in Remote Sensing with Fine-Grained Process Supervision

Lang Sun; Ronghao Fu; Zhuoran Duan; Haoran Liu; Xueyan Liu; Bo Yang

arXiv:2603.09551·cs.CV·March 11, 2026

GeoSolver: Scaling Test-Time Reasoning in Remote Sensing with Fine-Grained Process Supervision

Lang Sun, Ronghao Fu, Zhuoran Duan, Haoran Liu, Xueyan Liu, Bo Yang

PDF

Open Access

TL;DR

GeoSolver introduces a process-supervised reinforcement learning framework with a large-scale dataset and a token-level reward model to improve reasoning faithfulness and scalability in remote sensing interpretation.

Contribution

It presents GeoSolver, a novel approach combining process supervision, a large dataset, and reinforcement learning to enhance reasoning accuracy and scalability in remote sensing models.

Findings

01

GeoSolver-9B achieves state-of-the-art results on remote sensing benchmarks.

02

GeoPRM enables robust test-time scaling and improves general-purpose VLMs.

03

The framework ensures verifiable, faithful reasoning steps in remote sensing tasks.

Abstract

While Vision-Language Models (VLMs) have significantly advanced remote sensing interpretation, enabling them to perform complex, step-by-step reasoning remains highly challenging. Recent efforts to introduce Chain-of-Thought (CoT) reasoning to this domain have shown promise, yet ensuring the visual faithfulness of these intermediate steps remains a critical bottleneck. To address this, we introduce GeoSolver, a novel framework that transitions remote sensing reasoning toward verifiable, process-supervised reinforcement learning. We first construct Geo-PRM-2M, a large-scale, token-level process supervision dataset synthesized via entropy-guided Monte Carlo Tree Search (MCTS) and targeted visual hallucination injection. Building upon this dataset, we train GeoPRM, a token-level process reward model (PRM) that provides granular faithfulness feedback. To effectively leverage these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRemote-Sensing Image Classification · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning