Shelf-Supervised Cross-Modal Pre-Training for 3D Object Detection

Mehar Khurana; Neehar Peri; James Hays; Deva Ramanan

arXiv:2406.10115·cs.CV·October 16, 2024

Shelf-Supervised Cross-Modal Pre-Training for 3D Object Detection

Mehar Khurana, Neehar Peri, James Hays, Deva Ramanan

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces a shelf-supervised pre-training method for 3D object detection that leverages image-based foundation models to generate pseudo-labels from paired RGB and LiDAR data, improving detection accuracy especially with limited labeled data.

Contribution

It proposes a novel shelf-supervised approach using off-the-shelf image foundation models to generate zero-shot 3D bounding boxes for pre-training, enhancing semi-supervised detection performance.

Findings

01

Significantly improves detection accuracy over prior self-supervised methods.

02

Effective for LiDAR-only, RGB-only, and multi-modal detectors.

03

Demonstrates superior results on nuScenes and WOD datasets in limited data scenarios.

Abstract

State-of-the-art 3D object detectors are often trained on massive labeled datasets. However, annotating 3D bounding boxes remains prohibitively expensive and time-consuming, particularly for LiDAR. Instead, recent works demonstrate that self-supervised pre-training with unlabeled data can improve detection accuracy with limited labels. Contemporary methods adapt best-practices for self-supervised learning from the image domain to point clouds (such as contrastive learning). However, publicly available 3D datasets are considerably smaller and less diverse than those used for image-based self-supervised learning, limiting their effectiveness. We do note, however, that such 3D data is naturally collected in a multimodal fashion, often paired with images. Rather than pre-training with only self-supervised objectives, we argue that it is better to bootstrap point cloud representations using…

Peer Reviews

Decision·CoRL 2024

Reviewer 01Rating 3Confidence 3

Reviewer 02Rating 1Confidence 4

Reviewer 03Rating 1Confidence 5

Code & Models

Repositories

meharkhurana03/cm3d
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications