TL;DR
GEM introduces a deformable mamba-based generative model for LiDAR world modeling, improving fidelity and dynamic scene understanding for autonomous driving applications.
Contribution
The paper presents a novel deformable mamba architecture tailored for LiDAR data, enabling better spatial-temporal modeling and scene generation capabilities.
Findings
Achieves state-of-the-art performance on multiple benchmarks.
Effectively disentangles static and dynamic features in LiDAR data.
Demonstrates potential for autonomous planning and 'what-if' scenario generation.
Abstract
World models, which simulate environmental dynamics and generate sensor observations, are gaining increasing attention in autonomous driving. However, progress in LiDAR-based world models has lagged behind those built on camera videos or occupancy data, primarily due to two core challenges: the inherent disorder of LiDAR point clouds and the difficulty of distinguishing dynamic objects from static structures. To address these issues, we propose GEM: a Generative LiDAR world model that leverages deformable mamba architecture, significantly improving fidelity and imaginative capability. Specifically, leveraging the structural similarity between sequential laser scanning and Mamba's processing mechanism, we first tokenize LiDAR sweeps into compact representations via a custom LiDAR scene tokenizer. After unsupervised disentanglement of tokenized features via a dynamic-static separator, a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
