VFM-Loc: Zero-Shot Cross-View Geo-Localization via Aligning Discriminative Visual Hierarchies
Jun Lu, Zehao Sang, Haoqi Wei, Xiangyun Liu, Kun Zhu, Haitao Guo, Zhihui Gong, and Lei Ding

TL;DR
VFM-Loc introduces a training-free, zero-shot framework for cross-view geo-localization that aligns discriminative visual features from foundational models, significantly improving accuracy in real-world scenarios with large viewpoint differences.
Contribution
It proposes a novel hierarchical feature extraction and statistical alignment method leveraging vision foundational models for zero-shot cross-view geo-localization.
Findings
Achieves over 20% higher Recall@1 than supervised methods on LO-UCV dataset.
Demonstrates strong zero-shot accuracy on standard benchmarks.
Establishes a training-free, robust paradigm for real-world CVGL.
Abstract
Cross-View Geo-Localization (CVGL) in remote sensing aims to locate a drone-view query by matching it to geo-tagged satellite images. Although supervised methods have achieved strong results on closeset benchmarks, they often fail to generalize to unconstrained, real-world scenarios due to severe viewpoint differences and dataset bias. To overcome these limitations, we present VFM-Loc, a training-free framework for zero-shot CVGL that leverages the generalizable visual representations from vision foundational models (VFMs). VFM-Loc identifies and matches discriminative visual clues across different viewpoints through a progressive alignment strategy. First, we design a hierarchical clue extraction mechanism using Generalized Mean pooling and Scale-Weighted RMAC to preserve distinctive visual clues across scales while maintaining hierarchical confidence. Second, we introduce a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Remote-Sensing Image Classification
