TL;DR
Sky2Ground introduces a diverse dataset for evaluating camera localization and reconstruction across varying altitudes, and proposes SkyNet to improve multi-view consistency, outperforming existing methods.
Contribution
The paper presents Sky2Ground, a comprehensive multi-altitude dataset, and SkyNet, a novel model that enhances cross-view consistency with curriculum training, advancing large-scale 3D perception.
Findings
Sky2Ground dataset covers 51 sites with thousands of images at different altitudes.
State-of-the-art pose estimation models perform worse with satellite imagery.
SkyNet improves multi-view alignment, outperforming existing methods by 9.6% and 18.1%.
Abstract
We introduce Sky2Ground, a three-view dataset designed for varying altitude camera localization, correspondence learning, and reconstruction. The dataset combines structured synthetic imagery with real, in-the-wild images, providing both controlled multi-view geometry and realistic scene noise. Each of the 51 sites contains thousands of satellite, aerial, and ground images spanning wide altitude ranges and nearly orthogonal viewing angles, enabling rigorous evaluation across global-to-local contexts. We benchmark state of the art pose estimation models, including MASt3R, DUSt3R, Map Anything, and VGGT, and observe that the use of satellite imagery often degrades performance, highlighting the challenges under large altitude variations. We also examine reconstruction methods, highlighting the challenges introduced by sparse geometric overlap, varying perspectives, and the use of real…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
