Seeing Across Skies and Streets: Feedforward 3D Reconstruction from Satellite, Drone, and Ground Images

Qiwei Wang; Zhongyao Tuo; Xianghui Ze; Yujiao Shi

arXiv:2605.07978·cs.CV·May 11, 2026

Seeing Across Skies and Streets: Feedforward 3D Reconstruction from Satellite, Drone, and Ground Images

Qiwei Wang, Zhongyao Tuo, Xianghui Ze, Yujiao Shi

PDF

TL;DR

This paper introduces Cross3R, a feedforward model that uses satellite, UAV, and ground images to reconstruct 3D scenes and estimate camera poses, overcoming limitations of traditional 3-DoF localization.

Contribution

The paper presents Cross3R, a novel model that integrates multi-view images for accurate 3D reconstruction and pose estimation without known relative poses.

Findings

01

Cross3R outperforms existing feed-forward baselines in 3D reconstruction and localization.

02

Cross3R surpasses dedicated cross-view methods on KITTI without training on it.

03

The CrossGeo dataset contains 278K images across 85 diverse scenes.

Abstract

Cross-view localization classically asks: where does this ground image lie on the satellite tile? Existing methods are typically limited to 3-DoF estimates -- an $(x, y)$ position and a yaw angle -- because nadir satellite imagery provides no direct cues for roll, pitch, or altitude, forcing a reliance on planar-motion and zero-tilt assumptions. These assumptions break on real terrain with slopes, ramps, and tilted camera mounts. To overcome this, we introduce a single UAV image as an intermediate viewpoint: it reveals the 3D structure invisible from nadir, supplies the cues for roll, pitch, and altitude that the satellite alone cannot provide, and needs only spatial overlap with the ground camera -- no known relative pose is required. Building on this insight, we propose **Cross3R**, a flexible feed-forward model that ingests a satellite tile together with a UAV image, a ground image,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.