MeSS: City Mesh-Guided Outdoor Scene Generation with Cross-View Consistent Diffusion

Xuyang Chen; Zhijun Zhai; Kaixuan Zhou; Zengmao Wang; Jianan He; Dong Wang; Yanfeng Zhang; mingwei Sun; R\"udiger Westermann; Konrad Schindler; Liqiu Meng

arXiv:2508.15169·cs.CV·January 6, 2026

MeSS: City Mesh-Guided Outdoor Scene Generation with Cross-View Consistent Diffusion

Xuyang Chen, Zhijun Zhai, Kaixuan Zhou, Zengmao Wang, Jianan He, Dong Wang, Yanfeng Zhang, mingwei Sun, R\"udiger Westermann, Konrad Schindler, Liqiu Meng

PDF

TL;DR

MeSS introduces a novel pipeline that combines enhanced image diffusion models with geometric priors and control mechanisms to generate high-quality, cross-view consistent outdoor city scenes from mesh models, improving realism and style diversity.

Contribution

The paper presents a new method that improves cross-view consistency in outdoor scene generation by integrating control-based diffusion models with geometric priors and scene reconstruction.

Findings

01

Outperforms existing methods in geometric alignment.

02

Produces high-quality, style-consistent outdoor scenes.

03

Enables diverse scene rendering through relighting and style transfer.

Abstract

Mesh models have become increasingly accessible for numerous cities; however, the lack of realistic textures restricts their application in virtual urban navigation and autonomous driving. To address this, this paper proposes MeSS (Meshbased Scene Synthesis) for generating high-quality, styleconsistent outdoor scenes with city mesh models serving as the geometric prior. While image and video diffusion models can leverage spatial layouts (such as depth maps or HD maps) as control conditions to generate street-level perspective views, they are not directly applicable to 3D scene generation. Video diffusion models excel at synthesizing consistent view sequences that depict scenes but often struggle to adhere to predefined camera paths or align accurately with rendered control videos. In contrast, image diffusion models, though unable to guarantee cross-view visual consistency, can produce…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.