From Street to Orbit: Training-Free Cross-View Retrieval via Location Semantics and LLM Guidance
Jeongho Min, Dongyoung Kim, Jaehyup Lee

TL;DR
This paper introduces a training-free cross-view image retrieval method that uses a pretrained vision encoder and LLM guidance to match street-view images with satellite images without supervised training.
Contribution
It proposes a novel zero-shot framework leveraging web-based image search, LLM inference, and pretrained vision encoders for street-to-satellite retrieval without additional training.
Findings
Outperforms prior learning-based methods on benchmark datasets
Enables automatic creation of semantically aligned street-satellite datasets
Operates effectively without supervised training or fine-tuning
Abstract
Cross-view image retrieval, particularly street-to-satellite matching, is a critical task for applications such as autonomous navigation, urban planning, and localization in GPS-denied environments. However, existing approaches often require supervised training on curated datasets and rely on panoramic or UAV-based images, which limits real-world deployment. In this paper, we present a simple yet effective cross-view image retrieval framework that leverages a pretrained vision encoder and a large language model (LLM), requiring no additional training. Given a monocular street-view image, our method extracts geographic cues through web-based image search and LLM-based location inference, generates a satellite query via geocoding API, and retrieves matching tiles using a pretrained vision encoder (e.g., DINOv2) with PCA-based whitening feature refinement. Despite using no ground-truth…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Automated Road and Building Extraction · Robotics and Sensor-Based Localization
