WAM-Diff: A Masked Diffusion VLA Framework with MoE and Online Reinforcement Learning for Autonomous Driving

Mingwang Xu; Jiahao Cui; Feipeng Cai; Hanlin Shang; Zhihao Zhu; Shan Luan; Yifang Xu; Neng Zhang; Yaoyi Li; Jia Cai; Siyu Zhu

arXiv:2512.11872·cs.RO·December 17, 2025

WAM-Diff: A Masked Diffusion VLA Framework with MoE and Online Reinforcement Learning for Autonomous Driving

Mingwang Xu, Jiahao Cui, Feipeng Cai, Hanlin Shang, Zhihao Zhu, Shan Luan, Yifang Xu, Neng Zhang, Yaoyi Li, Jia Cai, Siyu Zhu

PDF

Open Access 1 Models

TL;DR

WAM-Diff introduces a masked diffusion framework with MoE and online reinforcement learning for autonomous driving, enabling flexible trajectory refinement and achieving high performance on benchmark datasets.

Contribution

The paper proposes a novel masked diffusion approach for trajectory generation in autonomous driving, incorporating MoE architecture and online RL for improved performance.

Findings

01

Achieves 91.0 PDMS on NAVSIM-v1

02

Achieves 89.7 EPDMS on NAVSIM-v2

03

Demonstrates effectiveness of masked diffusion in autonomous driving

Abstract

End-to-end autonomous driving systems based on vision-language-action (VLA) models integrate multimodal sensor inputs and language instructions to generate planning and control signals. While autoregressive large language models and continuous diffusion policies are prevalent, the potential of discrete masked diffusion for trajectory generation remains largely unexplored. This paper presents WAM-Diff, a VLA framework that employs masked diffusion to iteratively refine a discrete sequence representing future ego-trajectories. Our approach features three key innovations: a systematic adaptation of masked diffusion for autonomous driving that supports flexible, non-causal decoding orders; scalable model capacity via a sparse MoE architecture trained jointly on motion prediction and driving-oriented visual question answering (VQA); and online reinforcement learning using Group Sequence…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
fudan-generative-ai/WAM-Diff
model· 10 dl
10 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Autonomous Vehicle Technology and Safety