Designing streetscapes from street-view imagery using diffusion models

Yuzhou Chen; Yuebing Liang; Lingqian Hu; Kailai Sun; Qingqi Song; Chang Zhao; and Shenhao Wang

arXiv:2605.17527·cs.CV·May 19, 2026

Designing streetscapes from street-view imagery using diffusion models

Yuzhou Chen, Yuebing Liang, Lingqian Hu, Kailai Sun, Qingqi Song, Chang Zhao, and Shenhao Wang

PDF

TL;DR

This paper introduces a diffusion model-based framework for generating realistic and controllable streetscape images from street-view imagery, supporting urban planning and design with alternative scenario visualization.

Contribution

It presents a novel multimodal dataset and demonstrates how diffusion models can synthesize semantically consistent streetscapes conditioned on visual and textual controls.

Findings

01

Incorporating visual controls reduces LPIPS by ~6%.

02

Semantic consistency improves by 23.7% in Orlando and 46.4% in Chicago.

03

Imagery controls dominate over textual prompts when conflicting.

Abstract

Street-view imagery (SVI) is widely used to quantify key indicators of urban environment, such as green- ery, sky, or road view indices. However, existing studies largely focus on measuring current streetscapes and rarely support the generation of alternative and non-existing urban scenarios, which is a core task in geospatial disciplines such as urban planning and design. To address this gap, we propose a gener- ative multimodal AI framework that synthesizes alternative streetscapes conditioned on targeted visual metrics, enabling direct visual exploration of urban scenarios. We first construct a multimodal dataset that aligns SVIs with textual descriptions, segmentation maps, road masks, and quantitative metrics of visual elements in Chicago and Orlando. Using this dataset, we demonstrate that diffusion models can produce realistic and semantically consistent streetscape imagery while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.