Uni-World VLA: Interleaved World Modeling and Planning for Autonomous Driving

Qiqi Liu; Huan Xu; Jingyu Li; Bin Sun; Zhihui Hao; Dangen She; Xiatian Zhu; Li Zhang

arXiv:2603.27287·cs.RO·March 31, 2026

Uni-World VLA: Interleaved World Modeling and Planning for Autonomous Driving

Qiqi Liu, Huan Xu, Jingyu Li, Bin Sun, Zhihui Hao, Dangen She, Xiatian Zhu, Li Zhang

PDF

1 Repo

TL;DR

Uni-World VLA introduces an interleaved world modeling and planning approach for autonomous driving, enabling continuous, adaptive decision-making by alternating between predicting future observations and planning actions.

Contribution

The paper proposes a novel unified VLA model that tightly couples world prediction and planning through step-by-step interleaving, enhancing decision accuracy in dynamic environments.

Findings

01

Achieves competitive closed-loop planning performance on NAVSIM benchmark.

02

Produces high-fidelity future frame predictions with integrated monocular depth cues.

03

Demonstrates the effectiveness of interleaved modeling and planning for autonomous driving.

Abstract

Autonomous driving requires reasoning about how the environment evolves and planning actions accordingly. Existing world-model-based approaches typically predict future scenes first and plan afterwards, resulting in open-loop imagination that may drift from the actual decision process. In this paper, we present Uni-World VLA, a unified vision-language-action (VLA) model that tightly interleaves future frame prediction and trajectory planning. Instead of generating a full world rollout before planning, our model alternates between predicting future frames and ego actions step by step, allowing planning decisions to be continuously conditioned on the imagined future observations. This interleaved generation forms a closed-loop interaction between world modeling and control, enabling more adaptive decision-making in dynamic traffic scenarios. In addition, we incorporate monocular depth…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

logosroboticsgroup/UniWorldVLA
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.