Sat2City: 3D City Generation from A Single Satellite Image with Cascaded Latent Diffusion

Tongyan Hua; Lutao Jiang; Ying-Cong Chen; Wufan Zhao

arXiv:2507.04403·cs.CV·July 8, 2025

Sat2City: 3D City Generation from A Single Satellite Image with Cascaded Latent Diffusion

Tongyan Hua, Lutao Jiang, Ying-Cong Chen, Wufan Zhao

PDF

TL;DR

Sat2City introduces a novel framework combining cascaded latent diffusion and sparse voxel grids to generate detailed 3D city models from a single satellite image, overcoming structural ambiguity and limited data issues.

Contribution

The paper presents Sat2City, a new method that integrates cascaded latent diffusion with sparse voxel grids and a novel dataset for 3D city generation from satellite images.

Findings

01

Achieves higher fidelity 3D city models than existing methods.

02

Successfully generates detailed 3D structures from a single satellite image.

03

Demonstrates effective handling of structural ambiguity in city modeling.

Abstract

Recent advancements in generative models have enabled 3D urban scene generation from satellite imagery, unlocking promising applications in gaming, digital twins, and beyond. However, most existing methods rely heavily on neural rendering techniques, which hinder their ability to produce detailed 3D structures on a broader scale, largely due to the inherent structural ambiguity derived from relatively limited 2D observations. To address this challenge, we propose Sat2City, a novel framework that synergizes the representational capacity of sparse voxel grids with latent diffusion models, tailored specifically for our novel 3D city dataset. Our approach is enabled by three key components: (1) A cascaded latent diffusion framework that progressively recovers 3D city structures from satellite imagery, (2) a Re-Hash operation at its Variational Autoencoder (VAE) bottleneck to compute…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.