OccScene: Semantic Occupancy-based Cross-task Mutual Learning for 3D Scene Generation

Bohan Li; Xin Jin; Jianan Wang; Yukai Shi; Yasheng Sun; Xiaofeng Wang; Zhuang Ma; Baao Xie; Chao Ma; Xiaokang Yang; Wenjun Zeng

arXiv:2412.11183·cs.CV·August 25, 2025

OccScene: Semantic Occupancy-based Cross-task Mutual Learning for 3D Scene Generation

Bohan Li, Xin Jin, Jianan Wang, Yukai Shi, Yasheng Sun, Xiaofeng Wang, Zhuang Ma, Baao Xie, Chao Ma, Xiaokang Yang, Wenjun Zeng

PDF

Open Access

TL;DR

OccScene introduces a unified framework that jointly enhances 3D scene generation and perception by mutual learning, leveraging semantic occupancy guidance and a dual alignment module to produce realistic scenes and improve perception accuracy.

Contribution

The paper presents OccScene, a novel mutual learning paradigm that integrates 3D scene generation and perception within a single diffusion-based framework, enabling cross-task improvements.

Findings

01

Achieves realistic 3D scene generation in indoor and outdoor scenarios.

02

Significantly improves 3D semantic occupancy prediction performance.

03

Demonstrates the effectiveness of joint perception and generation training.

Abstract

Recent diffusion models have demonstrated remarkable performance in both 3D scene generation and perception tasks. Nevertheless, existing methods typically separate these two processes, acting as a data augmenter to generate synthetic data for downstream perception tasks. In this work, we propose OccScene, a novel mutual learning paradigm that integrates fine-grained 3D perception and high-quality generation in a unified framework, achieving a cross-task win-win effect. OccScene generates new and consistent 3D realistic scenes only depending on text prompts, guided with semantic occupancy in a joint-training diffusion framework. To align the occupancy with the diffusion latent, a Mamba-based Dual Alignment module is introduced to incorporate fine-grained semantics and geometry as perception priors. Within OccScene, the perception module can be effectively improved with customized and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Human Motion and Animation · Image Processing and 3D Reconstruction

MethodsDiffusion · ALIGN