GeoLink: A 3D-Aware Framework Towards Better Generalization in Cross-View Geo-Localization

Hongyang Zhang; Yinhao Liu; Haitao Zhang; Zhongyi Wen; Zhenyu Kuang; Shuxian Liang; Xiansheng Hua

arXiv:2604.13183·cs.CV·April 17, 2026

GeoLink: A 3D-Aware Framework Towards Better Generalization in Cross-View Geo-Localization

Hongyang Zhang, Yinhao Liu, Haitao Zhang, Zhongyi Wen, Zhenyu Kuang, Shuxian Liang, Xiansheng Hua

PDF

TL;DR

GeoLink introduces a 3D-aware framework that enhances cross-view geo-localization by leveraging reconstructed scene point clouds to improve 2D feature transferability and generalization across unseen regions.

Contribution

The paper proposes a novel 3D-aware semantic-consistent framework that uses scene reconstruction and relation distillation to improve cross-view geo-localization performance.

Findings

01

GeoLink outperforms state-of-the-art methods on multiple benchmarks.

02

It achieves superior generalization across unseen domains and weather conditions.

03

The framework effectively mitigates view bias and enhances 2D feature transferability.

Abstract

Generalizable cross-view geo-localization aims to match the same location across views in unseen regions and conditions without GPS supervision. Its core difficulty lies in severe semantic inconsistency caused by viewpoint variation and poor generalization under domain shift. Existing methods mainly rely on 2D correspondence, but they are easily distracted by redundant shared information across views, leading to less transferable representations. To address this, we propose GeoLink, a 3D-aware semantic-consistent framework for Generalizable cross-view geo-localization. Specifically, we offline reconstruct scene point clouds from multi-view drone images using VGGT, providing stable structural priors. Based on these 3D anchors, we improve 2D representation learning in two complementary ways. A Geometric-aware Semantic Refinement module mitigates potentially redundant and view-biased…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.