ProphetDWM: A Driving World Model for Rolling Out Future Actions and Videos
Xiaodong Wang, Peixi Peng

TL;DR
ProphetDWM is an innovative driving world model that jointly predicts future videos and actions, enabling more accurate and long-term autonomous driving predictions by learning action dynamics and state transitions.
Contribution
It introduces a novel end-to-end model with an action module and diffusion-based transition, improving long-term video and action prediction in autonomous driving.
Findings
Achieves state-of-the-art video consistency
Provides the best action prediction accuracy
Enables high-quality long-term video and action generation
Abstract
Real-world driving requires people to observe the current environment, anticipate the future, and make appropriate driving decisions. This requirement is aligned well with the capabilities of world models, which understand the environment and predict the future. However, recent world models in autonomous driving are built explicitly, where they could predict the future by controllable driving video generation. We argue that driving world models should have two additional abilities: action control and action prediction. Following this line, previous methods are limited because they predict the video requires given actions of the same length as the video and ignore the dynamical action laws. To address these issues, we propose ProphetDWM, a novel end-to-end driving world model that jointly predicts future videos and actions. Our world model has an action module to learn latent action from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSimulation Techniques and Applications
