GPD-1: Generative Pre-training for Driving

Zixun Xie; Sicheng Zuo; Wenzhao Zheng; Yunpeng Zhang; Dalong Du; Jie; Zhou; Jiwen Lu; Shanghang Zhang

arXiv:2412.08643·cs.CV·December 12, 2024

GPD-1: Generative Pre-training for Driving

Zixun Xie, Sicheng Zuo, Wenzhao Zheng, Yunpeng Zhang, Dalong Du, Jie, Zhou, Jiwen Lu, Shanghang Zhang

PDF

Open Access 2 Repos

TL;DR

GPD-1 is a unified autoregressive transformer model pre-trained on large-scale driving data that can perform multiple autonomous driving tasks like scene generation, prediction, and planning without additional fine-tuning.

Contribution

This paper introduces GPD-1, a novel unified pre-training framework that handles various autonomous driving tasks simultaneously using token-based scene representation and transformer architecture.

Findings

01

GPD-1 effectively generalizes to multiple tasks without fine-tuning.

02

The model achieves competitive results on scene generation and motion prediction.

03

Pre-training on nuPlan enhances the model's understanding of driving scenarios.

Abstract

Modeling the evolutions of driving scenarios is important for the evaluation and decision-making of autonomous driving systems. Most existing methods focus on one aspect of scene evolution such as map generation, motion prediction, and trajectory planning. In this paper, we propose a unified Generative Pre-training for Driving (GPD-1) model to accomplish all these tasks altogether without additional fine-tuning. We represent each scene with ego, agent, and map tokens and formulate autonomous driving as a unified token generation problem. We adopt the autoregressive transformer architecture and use a scene-level attention mask to enable intra-scene bi-directional interactions. For the ego and agent tokens, we propose a hierarchical positional tokenizer to effectively encode both 2D positions and headings. For the map tokens, we train a map vector-quantized autoencoder to efficiently…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOlder Adults Driving Studies · demographic modeling and climate adaptation

MethodsSoftmax · Attention Is All You Need · ADaptive gradient method with the OPTimal convergence rate · Focus