X-Scene: Large-Scale Driving Scene Generation with High Fidelity and Flexible Controllability
Yu Yang, Alan Liang, Jianbiao Mei, Yukai Ma, Yong Liu, Gim Hee Lee

TL;DR
X-Scene is a novel framework for large-scale driving scene generation that combines high fidelity, geometric detail, and flexible multi-level control, advancing autonomous driving data synthesis and simulation.
Contribution
The paper introduces X-Scene, a comprehensive system for large-scale 3D driving scene generation with multi-granular control and a unified pipeline for high-quality, temporally consistent scenes.
Findings
Achieves high geometric and visual fidelity in large-scale scenes.
Supports flexible control via user input, text, and semantic guidance.
Demonstrates improved realism and controllability over existing methods.
Abstract
Diffusion models are advancing autonomous driving by enabling realistic data synthesis, predictive end-to-end planning, and closed-loop simulation, with a primary focus on temporally consistent generation. However, large-scale 3D scene generation requiring spatial coherence remains underexplored. In this paper, we present X-Scene, a novel framework for large-scale driving scene generation that achieves geometric intricacy, appearance fidelity, and flexible controllability. Specifically, X-Scene supports multi-granular control, including low-level layout conditioning driven by user input or text for detailed scene composition, and high-level semantic guidance informed by user intent and LLM-enriched prompts for efficient customization. To enhance geometric and visual fidelity, we introduce a unified pipeline that sequentially generates 3D semantic occupancy and corresponding multi-view…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Computer Graphics and Visualization Techniques · Advanced Image and Video Retrieval Techniques
MethodsFocus
