OmniDrive: A Holistic Vision-Language Dataset for Autonomous Driving with Counterfactual Reasoning
Shihao Wang, Zhiding Yu, Xiaohui Jiang, Shiyi Lan, Min Shi, Nadine, Chang, Jan Kautz, Ying Li, and Jose M. Alvarez

TL;DR
OmniDrive introduces a comprehensive 3D vision-language dataset with counterfactual reasoning to improve autonomous driving decision-making and reasoning capabilities.
Contribution
The paper presents OmniDrive, a large-scale dataset with synthetic counterfactual annotations for 3D driving tasks, and explores new agent frameworks for vision-language alignment.
Findings
Enhanced performance on DriveLM Q&A benchmark
Improved nuScenes open-loop planning results
Insights into vision-language and 3D perception importance
Abstract
The advances in vision-language models (VLMs) have led to a growing interest in autonomous driving to leverage their strong reasoning capabilities. However, extending these capabilities from 2D to full 3D understanding is crucial for real-world applications. To address this challenge, we propose OmniDrive, a holistic vision-language dataset that aligns agent models with 3D driving tasks through counterfactual reasoning. This approach enhances decision-making by evaluating potential scenarios and their outcomes, similar to human drivers considering alternative actions. Our counterfactual-based synthetic data annotation process generates large-scale, high-quality datasets, providing denser supervision signals that bridge planning trajectories and language-based reasoning. Futher, we explore two advanced OmniDrive-Agent frameworks, namely Omni-L and Omni-Q, to assess the importance of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAutonomous Vehicle Technology and Safety · Transportation and Mobility Innovations
