UrbanWorld: An Urban World Model for 3D City Generation
Yu Shang, Yuming Lin, Yu Zheng, Hangyu Fan, Jingtao Ding, Jie Feng,, Jiansheng Chen, Li Tian, Yong Li

TL;DR
UrbanWorld is a novel generative model that automatically creates realistic, customizable, and interactive 3D urban environments, facilitating AI perception and navigation research.
Contribution
It introduces the first comprehensive urban world model integrating layout, scene design, rendering, and refinement with flexible control and open-source availability.
Findings
Achieves state-of-the-art realism in 3D urban environment generation.
Demonstrates controllable scene creation using textual and image prompts.
Validates interactive agent perception and navigation within generated environments.
Abstract
Cities, as the essential environment of human life, encompass diverse physical elements such as buildings, roads and vegetation, which continuously interact with dynamic entities like people and vehicles. Crafting realistic, interactive 3D urban environments is essential for nurturing AGI systems and constructing AI agents capable of perceiving, decision-making, and acting like humans in real-world environments. However, creating high-fidelity 3D urban environments usually entails extensive manual labor from designers, involving intricate detailing and representation of complex urban elements. Therefore, accomplishing this automatically remains a longstanding challenge. Toward this problem, we propose UrbanWorld, the first generative urban world model that can automatically create a customized, realistic and interactive 3D urban world with flexible control conditions. UrbanWorld…
Peer Reviews
Decision·Submitted to ICLR 2025
1. The task of 3D urban generation is important. 2. The method is reasonable and looks to have better quantitative results than previous models. 3. The writing is clear and easy to follow.
1. Claim of World Model. This work belongs to the 3D urban generation. It is over-claimed to be a world model and barely related to AGI. Authors should precisely identify the task and topic. Then, focus on the specific topic and make it comprehensive rather than claim some large topics. 2. Technical contributions. The motivation of the generation pipeline is unclear. Why do you need a vision language model? What are the special designs in your work different from others, and why do you need the
1. The paper is written fluently and is easy to understand. 2. The proposed method shows relatively better results in generating city scenes with assets that have new appearances. 3. The authors effectively showcase various capabilities of the pipeline.
1. While the authors state that the method achieves “customized, realistic, and interactive 3D urban world generation,” the results appear more simulation-style and fall short of true realism. The texture quality, as seen in Fig. 3 and 4, is not particularly impressive, and there are no significant improvements over CityDreamer. 2. The absence of video results is notable. For a 3D generation task, video demonstrations would better illustrate the quality and realism of the generated scenes. 3. Fi
UrbanWorld introduces a pipeline that integrates generative diffusion models with an urban-specific MLLM to achieve realistic urban scene creation. This combination allows for controlled generation of 3D assets and adaptive urban design.
1. The authors claimed section A is flexible urban layout “generation”. However, this is not like generation methods where the distribution of urban layouts are learned from real-world data [1][2][3]. It seems like the authors are just using OSM’s GT data (AIGC-layout is not explained anywhere in the paper). No detail is given on how did the authors transform the OSM data or AIGC data into untextured 3D urban environment. Is there any generation models or other networks involved? In short, if y
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Modeling in Geospatial Applications · 3D Surveying and Cultural Heritage · Remote Sensing and LiDAR Applications
MethodsDiffusion
