DriveDreamer-Policy: A Geometry-Grounded World-Action Model for Unified Generation and Planning

Yang Zhou; Xiaofeng Wang; Hao Shao; Letian Wang; Guosheng Zhao; Jiangnan Shao; Jiagang Zhu; Tingdong Yu; Zheng Zhu; Guan Huang; Steven L. Waslander

arXiv:2604.01765·cs.CV·April 3, 2026

DriveDreamer-Policy: A Geometry-Grounded World-Action Model for Unified Generation and Planning

Yang Zhou, Xiaofeng Wang, Hao Shao, Letian Wang, Guosheng Zhao, Jiangnan Shao, Jiagang Zhu, Tingdong Yu, Zheng Zhu, Guan Huang, Steven L. Waslander

PDF

1 Repo

TL;DR

DriveDreamer-Policy is a unified, geometry-aware world-action model for autonomous driving that integrates depth, video prediction, and planning, achieving state-of-the-art results on Navsim benchmarks.

Contribution

It introduces a modular architecture combining depth generation, video prediction, and planning guided by a geometry-aware world representation.

Findings

01

Achieves 89.2 PDMS on Navsim v1 and 88.7 EPDMS on Navsim v2 benchmarks.

02

Outperforms existing world-model-based approaches in planning and world generation.

03

Explicit depth learning enhances video imagination and planning robustness.

Abstract

Recently, world-action models (WAM) have emerged to bridge vision-language-action (VLA) models and world models, unifying their reasoning and instruction-following capabilities and spatio-temporal world modeling. However, existing WAM approaches often focus on modeling 2D appearance or latent representations, with limited geometric grounding-an essential element for embodied systems operating in the physical world. We present DriveDreamer-Policy, a unified driving world-action model that integrates depth generation, future video generation, and motion planning within a single modular architecture. The model employs a large language model to process language instructions, multi-view images, and actions, followed by three lightweight generators that produce depth, future video, and actions. By learning a geometry-aware world representation and using it to guide both future prediction and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

youngzhou1999/DriveDreamer-Policy
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.