TL;DR
This paper introduces GaussianWorld, a novel 4D occupancy forecasting framework for autonomous driving that models scene evolution explicitly using a Gaussian world model, improving 3D occupancy prediction accuracy without extra computation.
Contribution
It reformulates 3D occupancy prediction as a 4D forecasting problem and employs a Gaussian world model to explicitly leverage scene evolution priors.
Findings
Over 2% improvement in mIoU on nuScenes dataset
Effective scene evolution modeling without additional computational cost
Outperforms single-frame methods in 3D occupancy prediction
Abstract
3D occupancy prediction is important for autonomous driving due to its comprehensive perception of the surroundings. To incorporate sequential inputs, most existing methods fuse representations from previous frames to infer the current 3D occupancy. However, they fail to consider the continuity of driving scenarios and ignore the strong prior provided by the evolution of 3D scenes (e.g., only dynamic objects move). In this paper, we propose a world-model-based framework to exploit the scene evolution for perception. We reformulate 3D occupancy prediction as a 4D occupancy forecasting problem conditioned on the current sensor input. We decompose the scene evolution into three factors: 1) ego motion alignment of static scenes; 2) local movements of dynamic objects; and 3) completion of newly-observed scenes. We then employ a Gaussian world model (GaussianWorld) to explicitly exploit these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
