Unifying UAV Cross-View Geo-Localization via 3D Geometric Perception
Haoyuan Li, Wen Yang, Fang Xu, Hong Tan, Haijian Zhang, Shengyang Li, and Gui-Song Xia

TL;DR
This paper introduces a geometry-aware UAV geo-localization framework that reconstructs 3D scenes and uses a virtual Bird's-Eye View for improved cross-view matching and pose estimation in GNSS-denied environments.
Contribution
It proposes a unified approach combining 3D scene reconstruction, BEV rendering, and a novel Satellite-wise Attention Block for robust cross-view localization.
Findings
Achieves meter-level localization accuracy on refined University-1652 and SUES-200 datasets.
Outperforms state-of-the-art methods in urban environment localization.
Provides a new dataset with precise annotations for end-to-end evaluation.
Abstract
Cross-view geo-localization for Unmanned Aerial Vehicles (UAVs) operating in GNSS-denied environments remains challenging due to the severe geometric discrepancy between oblique UAV imagery and orthogonal satellite maps. Most existing methods address this problem through a decoupled pipeline of place retrieval and pose estimation, implicitly treating perspective distortion as appearance noise rather than an explicit geometric transformation. In this work, we propose a geometry-aware UAV geo-localization framework that explicitly models the 3D scene geometry to unify coarse place recognition and fine-grained pose estimation within a single inference pipeline. Our approach reconstructs a local 3D scene from multi-view UAV image sequences using a Visual Geometry Grounded Transformer (VGGT), and renders a virtual Bird's-Eye View (BEV) representation that orthorectifies the UAV perspective…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
