UniScene: Unified Occupancy-centric Driving Scene Generation

Bohan Li; Jiazhe Guo; Hongsi Liu; Yingshuang Zou; Yikang Ding; Xiwu; Chen; Hu Zhu; Feiyang Tan; Chi Zhang; Tiancai Wang; Shuchang Zhou; Li Zhang,; Xiaojuan Qi; Hao Zhao; Mu Yang; Wenjun Zeng; Xin Jin

arXiv:2412.05435·cs.CV·March 12, 2025

UniScene: Unified Occupancy-centric Driving Scene Generation

Bohan Li, Jiazhe Guo, Hongsi Liu, Yingshuang Zou, Yikang Ding, Xiwu, Chen, Hu Zhu, Feiyang Tan, Chi Zhang, Tiancai Wang, Shuchang Zhou, Li Zhang,, Xiaojuan Qi, Hao Zhao, Mu Yang, Wenjun Zeng, Xin Jin

PDF

Open Access 1 Repo 1 Models

TL;DR

UniScene is a unified framework that generates diverse, high-quality driving scene data including semantic occupancy, video, and LiDAR, improving over previous methods by using a hierarchical, occupancy-centric approach.

Contribution

It introduces the first unified, occupancy-centric framework for generating multiple data forms in driving scenes with hierarchical, transfer-based generation strategies.

Findings

01

Outperforms previous state-of-the-art in occupancy, video, and LiDAR generation.

02

Provides detailed intermediate representations for downstream tasks.

03

Reduces generation complexity for intricate scenes.

Abstract

Generating high-fidelity, controllable, and annotated training data is critical for autonomous driving. Existing methods typically generate a single data form directly from a coarse scene layout, which not only fails to output rich data forms required for diverse downstream tasks but also struggles to model the direct layout-to-data distribution. In this paper, we introduce UniScene, the first unified framework for generating three key data forms - semantic occupancy, video, and LiDAR - in driving scenes. UniScene employs a progressive generation process that decomposes the complex task of scene generation into two hierarchical steps: (a) first generating semantic occupancy from a customized scene layout as a meta scene representation rich in both semantic and geometric information, and then (b) conditioned on occupancy, generating video and LiDAR data, respectively, with two novel…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

arlo0o/uniscene-unified-occupancy-centric-driving-scene-generation
none

Models

🤗
Arlolo0/UniScene
model· ♡ 5
♡ 5

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Remote Sensing and LiDAR Applications