OneDrive: Unified Multi-Paradigm Driving with Vision-Language-Action Models

Yiwei Zhang; Xuesong Chen; Jin Gao; Hanshi Wang; Fudong Ge; Weiming Hu; Shaoshuai Shi; Zhipeng Zhang

arXiv:2604.17915·cs.CV·April 21, 2026

OneDrive: Unified Multi-Paradigm Driving with Vision-Language-Action Models

Yiwei Zhang, Xuesong Chen, Jin Gao, Hanshi Wang, Fudong Ge, Weiming Hu, Shaoshuai Shi, Zhipeng Zhang

PDF

1 Repo

TL;DR

This paper introduces OneDrive, a unified vision-language-action model for autonomous driving that leverages a single transformer decoder to handle multiple tasks, achieving state-of-the-art results and efficient inference.

Contribution

The work presents a novel unified framework that integrates heterogeneous driving tasks within a pretrained VLM using a single causal decoder, enhancing efficiency and performance.

Findings

01

Achieves 0.28 L2 and 0.18 collision rate on nuScenes.

02

Attains 86.8 PDMS on NAVSIM.

03

Reduces inference latency by approximately 40%.

Abstract

Vision-Language Models(VLMs) excel at autoregressive text generation, yet end-to-end autonomous driving requires multi-task learning with structured outputs and heterogeneous decoding behaviors, such as autoregressive language generation, parallel object detection and trajectory regression. To accommodate these differences, existing systems typically introduce separate or cascaded decoders, resulting in architectural fragmentation and limited backbone reuse. In this work, we present a unified autonomous driving framework built upon a pretrained VLM, where heterogeneous decoding behaviors are reconciled within a single transformer decoder. We demonstrate that pretrained VLM attention exhibits strong transferability beyond pure language modeling. By organizing visual and structured query tokens within a single causal decoder, structured queries can naturally condition on visual context…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Z1zyw/OneDrive
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.