Vision-Language Reasoning for Geolocalization: A Reinforcement Learning Approach

Biao Wu; Meng Fang; Ling Chen; Ke Xu; Tao Cheng; Jun Wang

arXiv:2601.00388·cs.CL·January 6, 2026

Vision-Language Reasoning for Geolocalization: A Reinforcement Learning Approach

Biao Wu, Meng Fang, Ling Chen, Ke Xu, Tao Cheng, Jun Wang

PDF

Open Access 1 Video

TL;DR

This paper introduces Geo-R, a reinforcement learning framework for image geolocalization that uses hierarchical geographic reasoning without relying on synthetic annotations or external retrieval, improving accuracy and interpretability.

Contribution

Geo-R is a novel retrieval-free, reinforcement learning-based approach that leverages structured geographic reasoning for scalable and interpretable image geolocalization.

Findings

01

Achieved state-of-the-art accuracy on multiple benchmarks.

02

Demonstrated strong generalization across diverse datasets.

03

Provided transparent reasoning paths for localization decisions.

Abstract

Recent advances in vision-language models have opened up new possibilities for reasoning-driven image geolocalization. However, existing approaches often rely on synthetic reasoning annotations or external image retrieval, which can limit interpretability and generalizability. In this paper, we present Geo-R, a retrieval-free framework that uncovers structured reasoning paths from existing ground-truth coordinates and optimizes geolocation accuracy via reinforcement learning. We propose the Chain of Region, a rule-based hierarchical reasoning paradigm that generates precise, interpretable supervision by mapping GPS coordinates to geographic entities (e.g., country, province, city) without relying on model-generated or synthetic labels. Building on this, we introduce a lightweight reinforcement learning strategy with coordinate-aligned rewards based on Haversine distance, enabling the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Vision-Language Reasoning for Geolocalization: A Reinforcement Learning Approach· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization