GeoZero: Incentivizing Reasoning from Scratch on Geospatial Scenes

Di Wang; Shunyu Liu; Wentao Jiang; Fengxiang Wang; Yi Liu; Xiaolei Qin; Zhiming Luo; Chaoyang Zhou; Haonan Guo; Jing Zhang; Bo Du; Dacheng Tao; Liangpei Zhang

arXiv:2511.22645·cs.CV·March 16, 2026

GeoZero: Incentivizing Reasoning from Scratch on Geospatial Scenes

Di Wang, Shunyu Liu, Wentao Jiang, Fengxiang Wang, Yi Liu, Xiaolei Qin, Zhiming Luo, Chaoyang Zhou, Haonan Guo, Jing Zhang, Bo Du, Dacheng Tao, Liangpei Zhang

PDF

Open Access 2 Datasets

TL;DR

GeoZero introduces a novel framework for geospatial reasoning in multimodal large language models that eliminates the need for chain-of-thought supervision, using datasets and reinforcement learning to enhance reasoning without human bias.

Contribution

It presents GeoZero, a new approach that enables reasoning without predefined CoT data, utilizing datasets and a novel reinforcement learning method to improve geospatial understanding.

Findings

01

Outperforms existing state-of-the-art methods on remote sensing benchmarks.

02

Fosters universal emergent reasoning capabilities across diverse geospatial tasks.

03

Demonstrates effective reasoning without human-annotated chain-of-thought data.

Abstract

Multimodal large language models (MLLMs) have undergone rapid development in advancing geospatial scene understanding. Recent studies have sought to enhance the reasoning capabilities of remote sensing MLLMs, typically through cold-start training with elaborately curated chain-of-thought (CoT) data. However, this approach not only incurs substantial annotation costs but also introduces human biases that may limit the diversity of model reasoning. To address these challenges, we propose GeoZero, a framework that enables MLLMs to perform geospatial reasoning without any predefined CoT supervision. Specifically, we construct two datasets, GeoZero-Instruct and GeoZero-Hard. GeoZero-Instruct allows the model to acquire preliminary geospatial knowledge through supervised fine-tuning, while GeoZero-Hard stimulates deep reasoning during the subsequent reinforcement learning stage. Furthermore,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Constraint Satisfaction and Optimization · Domain Adaptation and Few-Shot Learning