Visual Point Cloud Forecasting enables Scalable Autonomous Driving
Zetong Yang, Li Chen, Yanan Sun, Hongyang Li

TL;DR
This paper introduces visual point cloud forecasting as a new pre-training task for autonomous driving, enabling models to better understand 3D geometry and temporal dynamics, leading to improved downstream perception and planning performance.
Contribution
It proposes a novel pre-training task and a general model, ViDAR, that jointly learns semantics, 3D structures, and temporal information for autonomous driving.
Findings
3.1% NDS improvement on 3D detection
~10% error reduction in motion forecasting
~15% collision rate reduction in planning
Abstract
In contrast to extensive studies on general vision, pre-training for scalable visual autonomous driving remains seldom explored. Visual autonomous driving applications require features encompassing semantics, 3D geometry, and temporal information simultaneously for joint perception, prediction, and planning, posing dramatic challenges for pre-training. To resolve this, we bring up a new pre-training task termed as visual point cloud forecasting - predicting future point clouds from historical visual input. The key merit of this task captures the synergic learning of semantics, 3D structures, and temporal dynamics. Hence it shows superiority in various downstream tasks. To cope with this new problem, we present ViDAR, a general model to pre-train downstream visual encoders. It first extracts historical embeddings by the encoder. These representations are then transformed to 3D geometric…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Autonomous Vehicle Technology and Safety · Advanced Neural Network Applications
