Learning Shared RGB-D Fields: Unified Self-supervised Pre-training for   Label-efficient LiDAR-Camera 3D Perception

Xiaohao Xu; Ye Li; Tianyi Zhang; Jinrong Yang; Matthew; Johnson-Roberson; Xiaonan Huang

arXiv:2405.17942·cs.CV·October 15, 2024

Learning Shared RGB-D Fields: Unified Self-supervised Pre-training for Label-efficient LiDAR-Camera 3D Perception

Xiaohao Xu, Ye Li, Tianyi Zhang, Jinrong Yang, Matthew, Johnson-Roberson, Xiaonan Huang

PDF

Open Access 1 Repo

TL;DR

This paper introduces NS-MAE, a unified self-supervised pretraining method for LiDAR and camera data in 3D perception, leveraging NeRF to improve label efficiency and transferability in autonomous driving tasks.

Contribution

It proposes a novel unified pretraining framework, NS-MAE, that optimizes multi-modal data jointly using NeRF-based masked autoencoding, outperforming separate modality strategies.

Findings

01

NS-MAE achieves superior transferability across 3D perception tasks.

02

It outperforms prior state-of-the-art methods in BEV map segmentation.

03

The approach enhances label-efficient fine-tuning in autonomous driving scenarios.

Abstract

Constructing large-scale labeled datasets for multi-modal perception model training in autonomous driving presents significant challenges. This has motivated the development of self-supervised pretraining strategies. However, existing pretraining methods mainly employ distinct approaches for each modality. In contrast, we focus on LiDAR-Camera 3D perception models and introduce a unified pretraining strategy, NeRF-Supervised Masked Auto Encoder (NS-MAE), which optimizes all modalities through a shared formulation. NS-MAE leverages NeRF's ability to encode both appearance and geometry, enabling efficient masked reconstruction of multi-modal data. Specifically, embeddings are extracted from corrupted LiDAR point clouds and images, conditioned on view directions and locations. Then, these embeddings are rendered into multi-modal feature maps from two crucial viewpoints for 3D driving…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xiaohao-xu/unified-pretrain-ad
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Image Processing and 3D Reconstruction · Image Retrieval and Classification Techniques

MethodsFocus