UniScene: Multi-Camera Unified Pre-training via 3D Scene Reconstruction   for Autonomous Driving

Chen Min; Liang Xiao; Dawei Zhao; Yiming Nie; Bin Dai

arXiv:2305.18829·cs.CV·April 30, 2024·1 cites

UniScene: Multi-Camera Unified Pre-training via 3D Scene Reconstruction for Autonomous Driving

Chen Min, Liang Xiao, Dawei Zhao, Yiming Nie, Bin Dai

PDF

Open Access 2 Repos

TL;DR

UniScene introduces a multi-camera unified pre-training framework for autonomous driving that reconstructs 3D scenes using occupancy, leveraging unlabeled data to improve 3D perception tasks and reduce annotation costs.

Contribution

It is the first to propose a multi-camera unified pre-training approach based on 3D scene reconstruction, enhancing multi-camera perception performance.

Findings

01

Improves multi-camera 3D object detection mAP by 2.0%.

02

Increases surrounding semantic scene completion mIoU by 3%.

03

Reduces 3D training annotation costs by 25%.

Abstract

Multi-camera 3D perception has emerged as a prominent research field in autonomous driving, offering a viable and cost-effective alternative to LiDAR-based solutions. The existing multi-camera algorithms primarily rely on monocular 2D pre-training. However, the monocular 2D pre-training overlooks the spatial and temporal correlations among the multi-camera system. To address this limitation, we propose the first multi-camera unified pre-training framework, called UniScene, which involves initially reconstructing the 3D scene as the foundational stage and subsequently fine-tuning the model on downstream tasks. Specifically, we employ Occupancy as the general representation for the 3D scene, enabling the model to grasp geometric priors of the surrounding world through pre-training. A significant benefit of UniScene is its capability to utilize a considerable volume of unlabeled…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Robotics and Sensor-Based Localization · Domain Adaptation and Few-Shot Learning