Bridging Scene Generation and Planning: Driving with World Model via Unifying Vision and Motion Representation
Xingtai Gui, Meijie Zhang, Tianyi Yan, Wencheng Han, Jiahao Gong, Feiyang Tan, Cheng-zhong Xu, Jianbing Shen

TL;DR
This paper introduces WorldDrive, a unified framework that combines scene generation and planning by integrating vision and motion representations, leading to improved autonomous driving performance.
Contribution
The paper proposes a novel Trajectory-aware Driving World Model and a Future-aware Rewarder to unify visual and motion representations for better scene understanding and planning.
Findings
Achieves state-of-the-art planning performance on benchmarks.
Generates diverse, plausible future scenes conditioned on trajectories.
Maintains high-fidelity action-controlled video generation.
Abstract
End-to-end autonomous driving aims to generate safe and plausible planning policies from raw sensor input. Driving world models have shown great potential in learning rich representations by predicting the future evolution of a driving scene. However, existing driving world models primarily focus on visual scene representation, and motion representation is not explicitly designed to be planner-shared and inheritable, leaving a schism between the optimization of visual scene generation and the requirements of precise motion planning. We present WorldDrive, a holistic framework that couples scene generation and real-time planning via unifying vision and motion representation. We first introduce a Trajectory-aware Driving World Model, which conditions on a trajectory vocabulary to enforce consistency between visual dynamics and motion intentions, enabling the generation of diverse and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotic Path Planning Algorithms · Multimodal Machine Learning Applications · Autonomous Vehicle Technology and Safety
