DLWM: Dual Latent World Models enable Holistic Gaussian-centric Pre-training in Autonomous Driving
Yiyao Zhu, Ying Xue, Haiming Zhang, Guangfeng Jiang, Wending Zhou, Xu Yan, Jiantao Gao, Yingjie Cai, Bingbing Liu, Zhen Li, Shaojie Shen

TL;DR
DLWM introduces dual latent world models for holistic gaussian-centric pre-training in autonomous driving, improving 3D perception, forecasting, and planning with a two-stage self-supervised approach.
Contribution
It proposes a novel dual latent world model paradigm for holistic gaussian-centric pre-training in autonomous driving, with two stages for scene understanding and motion prediction.
Findings
Significant performance improvements on SurroundOcc and nuScenes benchmarks.
Enhanced 3D occupancy perception and 4D forecasting accuracy.
Improved motion planning capabilities in autonomous driving scenarios.
Abstract
Vision-based autonomous driving has gained much attention due to its low costs and excellent performance. Compared with dense BEV (Bird's Eye View) or sparse query models, Gaussian-centric method is a comprehensive yet sparse representation by describing scene with 3D semantic Gaussians. In this paper, we introduce DLWM, a novel paradigm with Dual Latent World Models specifically designed to enable holistic gaussian-centric pre-training in autonomous driving using two stages. In the first stage, DLWM predicts 3D Gaussians from queries by self-supervised reconstructing multi-view semantic and depth images. Equipped with fine-grained contextual features, in the second stage, two latent world models are trained separately for temporal feature learning, including Gaussian-flow-guided latent prediction for downstream occupancy perception and forecasting tasks, and ego-planning-guided latent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
