GeoX: Mastering Geospatial Reasoning Through Self-Play and Verifiable Rewards
Kyeongjin Ahn, Seungeon Lee, Krishna P. Gummadi, Meeyoung Cha

TL;DR
GeoX introduces a self-play framework that enhances geospatial reasoning in models by using executable programs and verifiable rewards, reducing reliance on costly human annotations.
Contribution
It presents a novel self-play approach with executable programs and a new benchmark for geospatial understanding, improving model performance without large curated datasets.
Findings
GeoX improves base VLMs by up to 5.5 points on average.
The framework matches or exceeds baselines trained on millions of curated data.
A new benchmark for geospatial understanding is released.
Abstract
Geospatial reasoning requires solving image-grounded problems over the complex spatial structure of a scene. However, developing this capability is hindered by the cost of annotating a vast and combinatorial question space. We propose GeoX, a self-play framework that acquires spatial logic through executable programs that yield verifiable rewards, without relying on large-scale human-curated data Given a satellite or aerial image, our framework employs a single multimodal policy that proposes spatial problems as executable programs and solves them under three reasoning modes-abduction, deduction, and induction-over spatial primitives and an image understanding tool. A verifier executes each program to covert a reward signal that jointly optimizes the two roles via reinforcement learning. GeoX consistently improves its base VLMs by up to 5.5 points on average, matching or exceeding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
