SAMoE-VLA: A Scene Adaptive Mixture-of-Experts Vision-Language-Action Model for Autonomous Driving

Zihan You; Hongwei Liu; Chenxu Dang; Zhe Wang; Sining Ang; Aoqi Wang; Yan Wang

arXiv:2603.08113·cs.CV·March 10, 2026

SAMoE-VLA: A Scene Adaptive Mixture-of-Experts Vision-Language-Action Model for Autonomous Driving

Zihan You, Hongwei Liu, Chenxu Dang, Zhe Wang, Sining Ang, Aoqi Wang, Yan Wang

PDF

Open Access

TL;DR

SAMoE-VLA introduces a scene-adaptive MoE framework for autonomous driving that leverages scene context for expert routing, improving safety and performance over token-based methods.

Contribution

It proposes a novel scene-adaptive MoE mechanism conditioned on BEV features and a cross-modal causal attention for better decision-making in autonomous driving.

Findings

01

Achieves state-of-the-art results on nuScenes and LangAuto datasets.

02

Outperforms prior VLA and world-model-based approaches.

03

Uses fewer parameters than existing models.

Abstract

Recent advances in Vision-Language-Action (VLA) models have shown promising capabilities in autonomous driving by leveraging the understanding and reasoning strengths of Large Language Models(LLMs).However, our empirical analysis reveals that directly applying existing token-level MoE mechanisms--which are inherited from LLM architectures--to VLA models results in unstable performance and safety degradation in autonomous driving, highlighting a misalignment between token-based expert specialization and scene-level decision-making.To address this, we propose SAMoE-VLA, a scene-adaptive Vision-Language-Action framework that conditions expert selection on structured scene representations instead of token embeddings. Our key idea is to derive the MoE routing signal from bird's-eye-view (BEV) features that encapsulates traffic scene context, enabling scenario-dependent expert weighting and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Autonomous Vehicle Technology and Safety