VECTOR-Drive: Tightly Coupled Vision-Language and Trajectory Expert Routing for End-to-End Autonomous Driving

Rui Zhao; Jianlin Yu; Zhenhai Gao; Jiaqiao Liu; Fei Gao

arXiv:2605.08830·cs.CV·May 20, 2026

VECTOR-Drive: Tightly Coupled Vision-Language and Trajectory Expert Routing for End-to-End Autonomous Driving

Rui Zhao, Jianlin Yu, Zhenhai Gao, Jiaqiao Liu, Fei Gao

PDF

TL;DR

VECTOR-Drive introduces a tightly coupled vision-language and trajectory expert routing framework for end-to-end autonomous driving, enhancing semantic understanding and motion planning integration.

Contribution

It presents a novel multimodal Transformer architecture with expert routing that improves semantic-motion coupling in autonomous driving models.

Findings

01

Achieves 88.91 Driving Score on Bench2Drive benchmark.

02

Outperforms existing end-to-end and VLA-based baselines.

03

Validates benefits of shared attention and expert routing.

Abstract

End-to-end autonomous driving requires models to understand traffic scenes, infer driving intent, and generate executable motion plans. Recent vision-language-action (VLA) models inherit semantic priors from large-scale vision-language pretraining, yet still face a coupling trade-off: fully shared backbones preserve multimodal interaction but may entangle language reasoning and trajectory prediction, whereas decou pled reasoning-action pipelines reduce task conflict but weaken semantic-motion coupling. We propose VECTOR-DRIVE, a tightly coupled VLA framework built on Qwen2.5-VL-3B. VECTOR-DRIVE keeps all tokens coupled through shared self attention and routes feed-forward computation according to token semantics. Vision and language tokens are processed by a Vision-Language Expert to preserve semantic priors, while target-point, ego-state, and noisy action tokens are routed to a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.