Learning Shared RGB-D Fields: Unified Self-supervised Pre-training for Label-efficient LiDAR-Camera 3D Perception
Xiaohao Xu, Ye Li, Tianyi Zhang, Jinrong Yang, Matthew, Johnson-Roberson, Xiaonan Huang

TL;DR
This paper introduces NS-MAE, a unified self-supervised pretraining method for LiDAR and camera data in 3D perception, leveraging NeRF to improve label efficiency and transferability in autonomous driving tasks.
Contribution
It proposes a novel unified pretraining framework, NS-MAE, that optimizes multi-modal data jointly using NeRF-based masked autoencoding, outperforming separate modality strategies.
Findings
NS-MAE achieves superior transferability across 3D perception tasks.
It outperforms prior state-of-the-art methods in BEV map segmentation.
The approach enhances label-efficient fine-tuning in autonomous driving scenarios.
Abstract
Constructing large-scale labeled datasets for multi-modal perception model training in autonomous driving presents significant challenges. This has motivated the development of self-supervised pretraining strategies. However, existing pretraining methods mainly employ distinct approaches for each modality. In contrast, we focus on LiDAR-Camera 3D perception models and introduce a unified pretraining strategy, NeRF-Supervised Masked Auto Encoder (NS-MAE), which optimizes all modalities through a shared formulation. NS-MAE leverages NeRF's ability to encode both appearance and geometry, enabling efficient masked reconstruction of multi-modal data. Specifically, embeddings are extracted from corrupted LiDAR point clouds and images, conditioned on view directions and locations. Then, these embeddings are rendered into multi-modal feature maps from two crucial viewpoints for 3D driving…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Image Processing and 3D Reconstruction · Image Retrieval and Classification Techniques
MethodsFocus
