Cross-Modal Self-Supervised Learning with Effective Contrastive Units   for LiDAR Point Clouds

Mu Cai; Chenxu Luo; Yong Jae Lee; Xiaodong Yang

arXiv:2409.06827·cs.CV·September 12, 2024

Cross-Modal Self-Supervised Learning with Effective Contrastive Units for LiDAR Point Clouds

Mu Cai, Chenxu Luo, Yong Jae Lee, Xiaodong Yang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a cross-modal contrastive learning approach for LiDAR point clouds, leveraging camera data to improve 3D perception tasks in autonomous driving, with novel contrastive units tailored for LiDAR data.

Contribution

It systematically studies multi-modality contrastive learning and proposes instance-aware, similarity-balanced units specifically designed for LiDAR point cloud pre-training.

Findings

01

Cross-modality contrastive learning outperforms single and multi-modality methods.

02

Proposed units significantly improve 3D detection and segmentation accuracy.

03

Achieves state-of-the-art results on multiple autonomous driving benchmarks.

Abstract

3D perception in LiDAR point clouds is crucial for a self-driving vehicle to properly act in 3D environment. However, manually labeling point clouds is hard and costly. There has been a growing interest in self-supervised pre-training of 3D perception models. Following the success of contrastive learning in images, current methods mostly conduct contrastive pre-training on point clouds only. Yet an autonomous driving vehicle is typically supplied with multiple sensors including cameras and LiDAR. In this context, we systematically study single modality, cross-modality, and multi-modality for contrastive learning of point clouds, and show that cross-modality wins over other alternatives. In addition, considering the huge difference between the training sources in 2D images and 3D point clouds, it remains unclear how to design more effective contrastive units for LiDAR. We therefore…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

qcraftai/cross-modal-ssl
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRemote Sensing and LiDAR Applications · Image Processing and 3D Reconstruction

MethodsContrastive Learning