Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous   Driving

Xiang Li; Pengfei Li; Yupeng Zheng; Wei Sun; Yan Wang; Yilun Chen

arXiv:2502.07309·cs.CV·February 12, 2025

Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous Driving

Xiang Li, Pengfei Li, Yupeng Zheng, Wei Sun, Yan Wang, Yilun Chen

PDF

Open Access 1 Repo

TL;DR

PreWorld is a semi-supervised, vision-centric 3D occupancy world model for autonomous driving that leverages 2D labels and a novel training paradigm to predict future scenes and trajectories effectively.

Contribution

It introduces a two-stage training approach combining self-supervised pre-training with supervised fine-tuning, enabling effective 3D occupancy prediction with limited 3D labels.

Findings

01

Achieves competitive 3D occupancy prediction results.

02

Demonstrates effective 4D occupancy forecasting.

03

Validates scalability and effectiveness on nuScenes dataset.

Abstract

Understanding world dynamics is crucial for planning in autonomous driving. Recent methods attempt to achieve this by learning a 3D occupancy world model that forecasts future surrounding scenes based on current observation. However, 3D occupancy labels are still required to produce promising results. Considering the high annotation cost for 3D outdoor scenes, we propose a semi-supervised vision-centric 3D occupancy world model, PreWorld, to leverage the potential of 2D labels through a novel two-stage training paradigm: the self-supervised pre-training stage and the fully-supervised fine-tuning stage. Specifically, during the pre-training stage, we utilize an attribute projection head to generate different attribute fields of a scene (e.g., RGB, density, semantic), thus enabling temporal supervision from 2D labels via volume rendering techniques. Furthermore, we introduce a simple yet…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

getterupper/PreWorld
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAutonomous Vehicle Technology and Safety · Advanced Neural Network Applications