Geo$^\textbf{2}$: Geometry-Guided Cross-view Geo-Localization and Image Synthesis
Yancheng Zhang, Xiaohan Zhang, Guangyu Sun, Zonglin Lyu, Safwan Wshah, Chen Chen

TL;DR
Geo^2 is a unified framework that leverages geometric priors from foundation models to improve cross-view geo-localization and image synthesis by embedding features into a shared 3D-aware space.
Contribution
It introduces GeoMap and GeoFlow to jointly address localization and bidirectional image synthesis using 3D geometric priors, achieving state-of-the-art results.
Findings
Achieves state-of-the-art performance on CVUSA, CVACT, and VIGOR benchmarks.
Effectively reduces cross-view discrepancies through a shared 3D-aware latent space.
Ensures bidirectional synthesis coherence with a novel consistency loss.
Abstract
Cross-view geo-spatial learning consists of two important tasks: Cross-View Geo-Localization (CVGL) and Cross-View Image Synthesis (CVIS), both of which rely on establishing geometric correspondences between ground and aerial views. Recent Geometric Foundation Models (GFMs) have demonstrated strong capabilities in extracting generalizable 3D geometric features from images, but their potential in cross-view geo-spatial tasks remains underexplored. In this work, we present Geo^2, a unified framework that leverages Geometric priors from GFMs (e.g., VGGT) to jointly perform geo-spatial tasks, CVGL and bidirectional CVIS. Despite the 3D reconstruction ability of GFMs, directly applying them to CVGL and CVIS remains challenging due to the large viewpoint gap between ground and aerial imagery. We propose GeoMap, which embeds ground and aerial features into a shared 3D-aware latent space,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
