From Street to Orbit: Training-Free Cross-View Retrieval via Location Semantics and LLM Guidance

Jeongho Min; Dongyoung Kim; Jaehyup Lee

arXiv:2511.09820·cs.CV·November 14, 2025

From Street to Orbit: Training-Free Cross-View Retrieval via Location Semantics and LLM Guidance

Jeongho Min, Dongyoung Kim, Jaehyup Lee

PDF

Open Access

TL;DR

This paper introduces a training-free cross-view image retrieval method that uses a pretrained vision encoder and LLM guidance to match street-view images with satellite images without supervised training.

Contribution

It proposes a novel zero-shot framework leveraging web-based image search, LLM inference, and pretrained vision encoders for street-to-satellite retrieval without additional training.

Findings

01

Outperforms prior learning-based methods on benchmark datasets

02

Enables automatic creation of semantically aligned street-satellite datasets

03

Operates effectively without supervised training or fine-tuning

Abstract

Cross-view image retrieval, particularly street-to-satellite matching, is a critical task for applications such as autonomous navigation, urban planning, and localization in GPS-denied environments. However, existing approaches often require supervised training on curated datasets and rely on panoramic or UAV-based images, which limits real-world deployment. In this paper, we present a simple yet effective cross-view image retrieval framework that leverages a pretrained vision encoder and a large language model (LLM), requiring no additional training. Given a monocular street-view image, our method extracts geographic cues through web-based image search and LLM-based location inference, generates a satellite query via geocoding API, and retrieves matching tiles using a pretrained vision encoder (e.g., DINOv2) with PCA-based whitening feature refinement. Despite using no ground-truth…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Automated Road and Building Extraction · Robotics and Sensor-Based Localization