Generative Planning with 3D-vision Language Pre-training for End-to-End   Autonomous Driving

Tengpeng Li; Hanli Wang; Xianfei Li; Wenlong Liao; Tao He; Pai Peng

arXiv:2501.08861·cs.CV·January 16, 2025

Generative Planning with 3D-vision Language Pre-training for End-to-End Autonomous Driving

Tengpeng Li, Hanli Wang, Xianfei Li, Wenlong Liao, Tao He, Pai Peng

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces GPVL, a generative planning model with 3D-vision language pre-training that enhances perception, decision-making, and scene understanding for end-to-end autonomous driving, demonstrating superior performance and generalization on nuScenes.

Contribution

The paper presents a novel 3D-vision language pre-training framework combined with a cross-modal language model for autonomous driving, improving scene understanding and decision accuracy.

Findings

01

Achieves state-of-the-art performance on nuScenes dataset.

02

Demonstrates strong generalization across scenarios.

03

Provides real-time decision-making capabilities.

Abstract

Autonomous driving is a challenging task that requires perceiving and understanding the surrounding environment for safe trajectory planning. While existing vision-based end-to-end models have achieved promising results, these methods are still facing the challenges of vision understanding, decision reasoning and scene generalization. To solve these issues, a generative planning with 3D-vision language pre-training model named GPVL is proposed for end-to-end autonomous driving. The proposed paradigm has two significant aspects. On one hand, a 3D-vision language pre-training module is designed to bridge the gap between visual perception and linguistic understanding in the bird's eye view. On the other hand, a cross-modal language model is introduced to generate holistic driving decisions and fine-grained trajectories with perception and navigation information in an auto-regressive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ltp1995/gpvl
pytorchOfficial

Videos

Generative Planning with 3D-Vision Language Pre-training for End-to-End Autonomous Driving· underline

Taxonomy

TopicsRobotic Path Planning Algorithms