DriveX: Omni Scene Modeling for Learning Generalizable World Knowledge in Autonomous Driving
Chen Shi, Shaoshuai Shi, Kehua Sheng, Bo Zhang, Li Jiang

TL;DR
DriveX introduces a self-supervised, generalizable world model for autonomous driving that captures comprehensive scene dynamics and improves performance across multiple tasks, reducing reliance on costly annotations.
Contribution
The paper presents DriveX, a novel omni scene modeling framework that unifies multimodal supervision and decouples world representation learning from future state decoding.
Findings
Achieves state-of-the-art 3D point cloud prediction
Improves occupancy and flow estimation accuracy
Enhances end-to-end driving performance
Abstract
Data-driven learning has advanced autonomous driving, yet task-specific models struggle with out-of-distribution scenarios due to their narrow optimization objectives and reliance on costly annotated data. We present DriveX, a self-supervised world model that learns generalizable scene dynamics and holistic representations (geometric, semantic, and motion) from large-scale driving videos. DriveX introduces Omni Scene Modeling (OSM), a module that unifies multimodal supervision-3D point cloud forecasting, 2D semantic representation, and image generation-to capture comprehensive scene evolution. To simplify learning complex dynamics, we propose a decoupled latent world modeling strategy that separates world representation learning from future state decoding, augmented by dynamic-aware ray sampling to enhance motion modeling. For downstream adaptation, we design Future Spatial Attention…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAutonomous Vehicle Technology and Safety · Advanced Neural Network Applications · Data Management and Algorithms
MethodsSoftmax · Attention Is All You Need
