GeoSolver: Scaling Test-Time Reasoning in Remote Sensing with Fine-Grained Process Supervision
Lang Sun, Ronghao Fu, Zhuoran Duan, Haoran Liu, Xueyan Liu, Bo Yang

TL;DR
GeoSolver introduces a process-supervised reinforcement learning framework with a large-scale dataset and a token-level reward model to improve reasoning faithfulness and scalability in remote sensing interpretation.
Contribution
It presents GeoSolver, a novel approach combining process supervision, a large dataset, and reinforcement learning to enhance reasoning accuracy and scalability in remote sensing models.
Findings
GeoSolver-9B achieves state-of-the-art results on remote sensing benchmarks.
GeoPRM enables robust test-time scaling and improves general-purpose VLMs.
The framework ensures verifiable, faithful reasoning steps in remote sensing tasks.
Abstract
While Vision-Language Models (VLMs) have significantly advanced remote sensing interpretation, enabling them to perform complex, step-by-step reasoning remains highly challenging. Recent efforts to introduce Chain-of-Thought (CoT) reasoning to this domain have shown promise, yet ensuring the visual faithfulness of these intermediate steps remains a critical bottleneck. To address this, we introduce GeoSolver, a novel framework that transitions remote sensing reasoning toward verifiable, process-supervised reinforcement learning. We first construct Geo-PRM-2M, a large-scale, token-level process supervision dataset synthesized via entropy-guided Monte Carlo Tree Search (MCTS) and targeted visual hallucination injection. Building upon this dataset, we train GeoPRM, a token-level process reward model (PRM) that provides granular faithfulness feedback. To effectively leverage these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRemote-Sensing Image Classification · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
