From Geometric Mimicry to Comprehensive Generation: A Context-Informed Multimodal Diffusion Model for Urban Morphology Synthesis

Fangshuo Zhou; Huaxia Li; Liuchang Xu; Rui Hu; Sensen Wu; Liang Xu; Hailin Feng; Zhenhong Du

arXiv:2409.17049·cs.CV·March 19, 2026

From Geometric Mimicry to Comprehensive Generation: A Context-Informed Multimodal Diffusion Model for Urban Morphology Synthesis

Fangshuo Zhou, Huaxia Li, Liuchang Xu, Rui Hu, Sensen Wu, Liang Xu, Hailin Feng, Zhenhong Du

PDF

1 Repo

TL;DR

This paper introduces ControlCity, a multimodal diffusion model that synthesizes urban morphology by integrating images, text, and metadata, significantly improving realism and controllability over traditional geometric methods.

Contribution

The study presents a novel multimodal diffusion framework for urban morphology generation, combining spatial, semantic, and geographical data for more accurate and controllable urban simulations.

Findings

01

71.01% reduction in visual error (FID)

02

38.46% improvement in spatial overlap (MIoU)

03

Enables cross-city style transfer and zero-shot generation

Abstract

Urban morphology is fundamental to determining urban functionality and vitality. Prevailing simulation methods, however, often oversimplify morphological generation as a geometric problem, lacking a profound understanding of urban semantics and geographical context. To address this limitation, this study proposes ControlCity, a diffusion model that achieves comprehensive urban morphology generation through multimodal information fusion. We first constructed a quadruple dataset comprising ``image-text-metadata-building footprints" from 22 cities worldwide. ControlCity utilizes these multidimensional information as joint control conditions, where an enhanced ControlNet architecture encodes spatial constraints from images, while text and metadata provide semantic guidance and geographical priors respectively, collectively directing the generation process. Experimental results demonstrate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fangshuoz/controlcity
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsDiffusion · ALIGN