Geometric-aware Pretraining for Vision-centric 3D Object Detection

Linyan Huang; Huijie Wang; Jia Zeng; Shengchuan Zhang; Liujuan Cao,; Junchi Yan; Hongyang Li

arXiv:2304.03105·cs.CV·April 10, 2023·5 cites

Geometric-aware Pretraining for Vision-centric 3D Object Detection

Linyan Huang, Huijie Wang, Jia Zeng, Shengchuan Zhang, Liujuan Cao,, Junchi Yan, Hongyang Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces GAPretrain, a geometric-aware pretraining framework that enhances vision-based 3D object detection by effectively leveraging spatial and structural cues from LiDAR and images, improving performance on autonomous driving benchmarks.

Contribution

We propose a novel pretraining method that incorporates geometric and structural cues using a unified BEV representation to improve view transformation and spatial feature extraction in 3D detection.

Findings

01

Achieves 46.2 mAP and 55.5 NDS on nuScenes val set with BEVFormer.

02

Demonstrates improved performance across various backbones and view transformations.

03

Serves as a flexible plug-and-play solution for state-of-the-art detectors.

Abstract

Multi-camera 3D object detection for autonomous driving is a challenging problem that has garnered notable attention from both academia and industry. An obstacle encountered in vision-based techniques involves the precise extraction of geometry-conscious features from RGB images. Recent approaches have utilized geometric-aware image backbones pretrained on depth-relevant tasks to acquire spatial information. However, these approaches overlook the critical aspect of view transformation, resulting in inadequate performance due to the misalignment of spatial knowledge between the image backbone and view transformation. To address this issue, we propose a novel geometric-aware pretraining framework called GAPretrain. Our approach incorporates spatial and structural cues to camera networks by employing the geometric-rich modality as guidance during the pretraining phase. The transference of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

opendrivelab/bevperception-survey-recipe
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Robotics and Sensor-Based Localization · Visual Attention and Saliency Detection