VLM-Guided Visual Place Recognition for Planet-Scale Geo-Localization
Sania Waheed, Na Min An, Michael Milford, Sarvapali D. Ramchurn, Shoaib Ehsan

TL;DR
This paper introduces a hybrid geo-localization system that combines vision-language models with retrieval-based visual place recognition, significantly improving accuracy and scalability for planet-scale image localization tasks.
Contribution
It proposes a novel framework that uses VLMs to guide and constrain visual place recognition, enhancing robustness and performance over prior methods.
Findings
Outperforms state-of-the-art methods on multiple benchmarks.
Achieves up to 4.51% improvement at street level.
Achieves up to 13.52% improvement at city level.
Abstract
Geo-localization from a single image at planet scale (essentially an advanced or extreme version of the kidnapped robot problem) is a fundamental and challenging task in applications such as navigation, autonomous driving and disaster response due to the vast diversity of locations, environmental conditions, and scene variations. Traditional retrieval-based methods for geo-localization struggle with scalability and perceptual aliasing, while classification-based approaches lack generalization and require extensive training data. Recent advances in vision-language models (VLMs) offer a promising alternative by leveraging contextual understanding and reasoning. However, while VLMs achieve high accuracy, they are often prone to hallucinations and lack interpretability, making them unreliable as standalone solutions. In this work, we propose a novel hybrid geo-localization framework that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques · Satellite Image Processing and Photogrammetry
