Visual Point Cloud Forecasting enables Scalable Autonomous Driving

Zetong Yang; Li Chen; Yanan Sun; Hongyang Li

arXiv:2312.17655·cs.CV·January 1, 2024·1 cites

Visual Point Cloud Forecasting enables Scalable Autonomous Driving

Zetong Yang, Li Chen, Yanan Sun, Hongyang Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces visual point cloud forecasting as a new pre-training task for autonomous driving, enabling models to better understand 3D geometry and temporal dynamics, leading to improved downstream perception and planning performance.

Contribution

It proposes a novel pre-training task and a general model, ViDAR, that jointly learns semantics, 3D structures, and temporal information for autonomous driving.

Findings

01

3.1% NDS improvement on 3D detection

02

~10% error reduction in motion forecasting

03

~15% collision rate reduction in planning

Abstract

In contrast to extensive studies on general vision, pre-training for scalable visual autonomous driving remains seldom explored. Visual autonomous driving applications require features encompassing semantics, 3D geometry, and temporal information simultaneously for joint perception, prediction, and planning, posing dramatic challenges for pre-training. To resolve this, we bring up a new pre-training task termed as visual point cloud forecasting - predicting future point clouds from historical visual input. The key merit of this task captures the synergic learning of semantics, 3D structures, and temporal dynamics. Hence it shows superiority in various downstream tasks. To cope with this new problem, we present ViDAR, a general model to pre-train downstream visual encoders. It first extracts historical embeddings by the encoder. These representations are then transformed to 3D geometric…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

opendrivelab/vidar
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Autonomous Vehicle Technology and Safety · Advanced Neural Network Applications