SpaceDrive: Infusing Spatial Awareness into VLM-based Autonomous Driving

Peizheng Li; Zhenghao Zhang; David Holtz; Hang Yu; Yutong Yang; Yuzhi Lai; Rui Song; Andreas Geiger; Andreas Zell

arXiv:2512.10719·cs.CV·May 22, 2026

SpaceDrive: Infusing Spatial Awareness into VLM-based Autonomous Driving

Peizheng Li, Zhenghao Zhang, David Holtz, Hang Yu, Yutong Yang, Yuzhi Lai, Rui Song, Andreas Geiger, Andreas Zell

PDF

1 Repo

TL;DR

SpaceDrive introduces a spatial-aware vision language model framework for autonomous driving, explicitly encoding 3D spatial information to improve reasoning and planning accuracy in complex environments.

Contribution

It proposes a novel method of using explicit 3D positional encodings in VLMs, enhancing spatial reasoning and trajectory prediction in autonomous driving tasks.

Findings

01

Achieves state-of-the-art open-loop performance on nuScenes dataset.

02

Attains second-best Driving Score of 78.02 on Bench2Drive benchmark.

03

Demonstrates improved spatial reasoning and planning accuracy.

Abstract

End-to-end autonomous driving methods built on vision language models (VLMs) have undergone rapid development driven by their universal visual understanding and strong reasoning capabilities obtained from the large-scale pretraining. However, we find that current VLMs struggle to understand fine-grained 3D spatial relationships which is a fundamental requirement for systems interacting with the physical world. To address this issue, we propose SpaceDrive, a spatial-aware VLM-based driving framework that treats spatial information as explicit positional encodings (PEs) instead of textual digit tokens, enabling joint reasoning over semantic and spatial representations. SpaceDrive employs a universal positional encoder to all 3D coordinates derived from multi-view depth estimation, historical ego-states, and text prompts. These 3D PEs are first superimposed to augment the corresponding 2D…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhenghao2519/SpaceDrive
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Constraint Satisfaction and Optimization · Human Motion and Animation