VisionPAD: A Vision-Centric Pre-training Paradigm for Autonomous Driving

Haiming Zhang; Wending Zhou; Yiyao Zhu; Xu Yan; Jiantao Gao; Dongfeng Bai; Yingjie Cai; Bingbing Liu; Shuguang Cui; Zhen Li

arXiv:2411.14716·cs.CV·May 23, 2025

VisionPAD: A Vision-Centric Pre-training Paradigm for Autonomous Driving

Haiming Zhang, Wending Zhou, Yiyao Zhu, Xu Yan, Jiantao Gao, Dongfeng Bai, Yingjie Cai, Bingbing Liu, Shuguang Cui, Zhen Li

PDF

Open Access

TL;DR

VisionPAD is a self-supervised pre-training method for autonomous driving that uses 3D Gaussian Splatting and multi-view consistency to improve 3D perception tasks from images alone.

Contribution

It introduces a novel self-supervised pre-training paradigm leveraging 3D Gaussian Splatting and multi-frame consistency for autonomous driving perception.

Findings

01

Outperforms state-of-the-art pre-training methods in 3D detection.

02

Enhances occupancy prediction accuracy.

03

Improves map segmentation results.

Abstract

This paper introduces VisionPAD, a novel self-supervised pre-training paradigm designed for vision-centric algorithms in autonomous driving. In contrast to previous approaches that employ neural rendering with explicit depth supervision, VisionPAD utilizes more efficient 3D Gaussian Splatting to reconstruct multi-view representations using only images as supervision. Specifically, we introduce a self-supervised method for voxel velocity estimation. By warping voxels to adjacent frames and supervising the rendered outputs, the model effectively learns motion cues in the sequential data. Furthermore, we adopt a multi-frame photometric consistency approach to enhance geometric perception. It projects adjacent frames to the current frame based on rendered depths and relative poses, boosting the 3D geometric representation through pure image supervision. Extensive experiments on autonomous…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotic Path Planning Algorithms

MethodsADaptive gradient method with the OPTimal convergence rate