Sat2City: 3D City Generation from A Single Satellite Image with Cascaded Latent Diffusion
Tongyan Hua, Lutao Jiang, Ying-Cong Chen, Wufan Zhao

TL;DR
Sat2City introduces a novel framework combining cascaded latent diffusion and sparse voxel grids to generate detailed 3D city models from a single satellite image, overcoming structural ambiguity and limited data issues.
Contribution
The paper presents Sat2City, a new method that integrates cascaded latent diffusion with sparse voxel grids and a novel dataset for 3D city generation from satellite images.
Findings
Achieves higher fidelity 3D city models than existing methods.
Successfully generates detailed 3D structures from a single satellite image.
Demonstrates effective handling of structural ambiguity in city modeling.
Abstract
Recent advancements in generative models have enabled 3D urban scene generation from satellite imagery, unlocking promising applications in gaming, digital twins, and beyond. However, most existing methods rely heavily on neural rendering techniques, which hinder their ability to produce detailed 3D structures on a broader scale, largely due to the inherent structural ambiguity derived from relatively limited 2D observations. To address this challenge, we propose Sat2City, a novel framework that synergizes the representational capacity of sparse voxel grids with latent diffusion models, tailored specifically for our novel 3D city dataset. Our approach is enabled by three key components: (1) A cascaded latent diffusion framework that progressively recovers 3D city structures from satellite imagery, (2) a Re-Hash operation at its Variational Autoencoder (VAE) bottleneck to compute…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
