MajutsuCity: Language-driven Aesthetic-adaptive City Generation with Controllable 3D Assets and Layouts

Zilong Huang; Jun He; Xiaobin Huang; Ziyi Xiong; Yang Luo; Junyan Ye; Weijia Li; Yiping Chen; Ting Han

arXiv:2511.20415·cs.CV·December 9, 2025

MajutsuCity: Language-driven Aesthetic-adaptive City Generation with Controllable 3D Assets and Layouts

Zilong Huang, Jun He, Xiaobin Huang, Ziyi Xiong, Yang Luo, Junyan Ye, Weijia Li, Yiping Chen, Ting Han

PDF

Open Access

TL;DR

MajutsuCity is a novel framework that uses natural language to generate and edit diverse, realistic 3D city scenes with controllable layouts and assets, advancing the state-of-the-art in 3D urban scene synthesis.

Contribution

It introduces a language-driven, aesthetically adaptive pipeline with an interactive editing agent and a high-quality dataset, enabling improved controllability and realism in 3D city generation.

Findings

01

Significantly reduces layout FID compared to prior methods.

02

Achieves top scores in structural consistency and stylistic diversity.

03

Outperforms existing methods in geometric fidelity and controllability.

Abstract

Generating realistic 3D cities is fundamental to world models, virtual reality, and game development, where an ideal urban scene must satisfy both stylistic diversity, fine-grained, and controllability. However, existing methods struggle to balance the creative flexibility offered by text-based generation with the object-level editability enabled by explicit structural representations. We introduce MajutsuCity, a natural language-driven and aesthetically adaptive framework for synthesizing structurally consistent and stylistically diverse 3D urban scenes. MajutsuCity represents a city as a composition of controllable layouts, assets, and materials, and operates through a four-stage pipeline. To extend controllability beyond initial generation, we further integrate MajutsuAgent, an interactive language-grounded editing agent} that supports five object-level operations. To support…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis