Sat3DGen: Comprehensive Street-Level 3D Scene Generation from Single Satellite Image
Ming Qian, Zimin Xia, Changkun Liu, Shuailei Ma, Wen Wang, Zeran Ke, Bin Tan, Hang Zhang, Gui-Song Xia

TL;DR
Sat3DGen introduces a geometry-first approach with novel constraints and training strategies to generate accurate, photorealistic street-level 3D scenes from single satellite images, outperforming existing methods.
Contribution
The paper presents a new geometry-centric methodology that significantly improves 3D accuracy and realism in satellite-to-street scene generation, addressing key geometric challenges.
Findings
Improved geometric RMSE from 6.76m to 5.20m on the benchmark.
Reduced FID from ~40 to 19, indicating higher photorealism.
Demonstrated versatility in applications like semantic mapping and DSM estimation.
Abstract
Generating a street-level 3D scene from a single satellite image is a crucial yet challenging task. Current methods present a stark trade-off: geometry-colorization models achieve high geometric fidelity but are typically building-focused and lack semantic diversity. In contrast, proxy-based models use feed-forward image-to-3D frameworks to generate holistic scenes by jointly learning geometry and texture, a process that yields rich content but coarse and unstable geometry. We attribute these geometric failures to the extreme viewpoint gap and sparse, inconsistent supervision inherent in satellite-to-street data. We introduce Sat3DGen to address these fundamental challenges, which embodies a geometry-first methodology. This methodology enhances the feed-forward paradigm by integrating novel geometric constraints with a perspective-view training strategy, explicitly countering the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
