DrivingGPT: Unifying Driving World Modeling and Planning with   Multi-modal Autoregressive Transformers

Yuntao Chen; Yuqi Wang; Zhaoxiang Zhang

arXiv:2412.18607·cs.CV·December 25, 2024

DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers

Yuntao Chen, Yuqi Wang, Zhaoxiang Zhang

PDF

Open Access

TL;DR

DrivingGPT unifies driving world modeling and planning using multimodal autoregressive transformers, enabling improved video generation and trajectory planning by modeling images and actions jointly.

Contribution

The paper introduces DrivingGPT, a novel multimodal transformer that combines world modeling and planning into a single sequence prediction framework.

Findings

01

Outperforms baselines on nuPlan and NAVSIM benchmarks.

02

Effective joint modeling of images and actions.

03

Enables both video generation and trajectory planning.

Abstract

World model-based searching and planning are widely recognized as a promising path toward human-level physical intelligence. However, current driving world models primarily rely on video diffusion models, which specialize in visual generation but lack the flexibility to incorporate other modalities like action. In contrast, autoregressive transformers have demonstrated exceptional capability in modeling multimodal data. Our work aims to unify both driving model simulation and trajectory planning into a single sequence modeling problem. We introduce a multimodal driving language based on interleaved image and action tokens, and develop DrivingGPT to learn joint world modeling and planning through standard next-token prediction. Our DrivingGPT demonstrates strong performance in both action-conditioned video generation and end-to-end planning, outperforming strong baselines on large-scale…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Management and Algorithms · Automated Road and Building Extraction · Autonomous Vehicle Technology and Safety

MethodsDiffusion