DreamerAD: Efficient Reinforcement Learning via Latent World Model for Autonomous Driving

Pengxuan Yang; Yupeng Zheng; Deheng Qian; Zebin Xing; Qichao Zhang; Linbo Wang; Yichen Zhang; Shaoyu Guo; Zhongpu Xia; Qiang Chen; Junyu Han; Lingyun Xu; Yifeng Pan; Dongbin Zhao

arXiv:2603.24587·cs.LG·April 2, 2026

DreamerAD: Efficient Reinforcement Learning via Latent World Model for Autonomous Driving

Pengxuan Yang, Yupeng Zheng, Deheng Qian, Zebin Xing, Qichao Zhang, Linbo Wang, Yichen Zhang, Shaoyu Guo, Zhongpu Xia, Qiang Chen, Junyu Han, Lingyun Xu, Yifeng Pan, Dongbin Zhao

PDF

TL;DR

DreamerAD introduces a latent world model framework that significantly accelerates reinforcement learning for autonomous driving by compressing diffusion sampling, achieving high performance with improved efficiency and interpretability.

Contribution

It presents the first latent world model for autonomous driving that reduces diffusion sampling steps from 100 to 1, enabling efficient RL training with high interpretability.

Findings

01

Achieves 87.7 EPDMS on NavSim v2, setting a new state-of-the-art.

02

Reduces diffusion inference latency from 2s/frame to near real-time.

03

Maintains visual interpretability while improving training efficiency.

Abstract

We introduce DreamerAD, the first latent world model framework that enables efficient reinforcement learning for autonomous driving by compressing diffusion sampling from 100 steps to 1 - achieving 80x speedup while maintaining visual interpretability. Training RL policies on real-world driving data incurs prohibitive costs and safety risks. While existing pixel-level diffusion world models enable safe imagination-based training, they suffer from multi-step diffusion inference latency (2s/frame) that prevents high-frequency RL interaction. Our approach leverages denoised latent features from video generation models through three key mechanisms: (1) shortcut forcing that reduces sampling complexity via recursive multi-resolution step compression, (2) an autoregressive dense reward model operating directly on latent representations for fine-grained credit assignment, and (3) Gaussian…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.