GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Model

Ling Li; Yu Ye; Yao Zhou; Bingchuan Jiang; Wei Zeng

arXiv:2406.18572·cs.CV·November 25, 2025·1 cites

GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Model

Ling Li, Yu Ye, Yao Zhou, Bingchuan Jiang, Wei Zeng

PDF

Open Access 1 Repo

TL;DR

GeoReasoner introduces a novel geo-localization approach using a large vision-language model enhanced with human inference and a new dataset, significantly improving accuracy at country and city levels.

Contribution

The paper presents a new dataset of locatable street views and integrates external human inference knowledge into a large vision-language model for improved geo-localization.

Findings

01

Outperforms existing LVLMs by over 25% at country level

02

Achieves 38% improvement at city level

03

Requires fewer training resources than StreetCLIP

Abstract

This work tackles the problem of geo-localization with a new paradigm using a large vision-language model (LVLM) augmented with human inference knowledge. A primary challenge here is the scarcity of data for training the LVLM - existing street-view datasets often contain numerous low-quality images lacking visual clues, and lack any reasoning inference. To address the data-quality issue, we devise a CLIP-based network to quantify the degree of street-view images being locatable, leading to the creation of a new dataset comprising highly locatable street views. To enhance reasoning inference, we integrate external knowledge obtained from real geo-localization games, tapping into valuable human inference capabilities. The data are utilized to train GeoReasoner, which undergoes fine-tuning through dedicated reasoning and location-tuning stages. Qualitative and quantitative evaluations…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lingli1996/georeasoner
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGeographic Information Systems Studies · Advanced Image and Video Retrieval Techniques · Automated Road and Building Extraction