AstraNav-World: World Model for Foresight Control and Consistency

Jintao Chen; Junjun Hu; Haochen Bai; Minghua Luo; Xinda Xue; Botao Ren; Chengyu Bai; Shichao Xie; Ziyi Chen; Fei Liu; Zedong Chu; Xiaolong Wu; Mu Xu; Shanghang Zhang

arXiv:2512.21714·cs.CV·April 9, 2026

AstraNav-World: World Model for Foresight Control and Consistency

Jintao Chen, Junjun Hu, Haochen Bai, Minghua Luo, Xinda Xue, Botao Ren, Chengyu Bai, Shichao Xie, Ziyi Chen, Fei Liu, Zedong Chu, Xiaolong Wu, Mu Xu, Shanghang Zhang

PDF

1 Repo

TL;DR

AstraNav-World introduces a unified probabilistic world model that jointly predicts future visuals and actions, enhancing embodied navigation in dynamic environments with improved accuracy and zero-shot real-world adaptation.

Contribution

It presents a novel end-to-end diffusion-based framework that tightly couples visual prediction and action planning, advancing the robustness and transferability of embodied navigation models.

Findings

01

Improved trajectory accuracy and success rates across benchmarks.

02

Tight vision-action coupling enhances prediction quality and policy reliability.

03

Zero-shot real-world adaptation without fine-tuning.

Abstract

Embodied navigation in open, dynamic environments demands accurate foresight of how the world will evolve and how actions will unfold over time. We propose AstraNav-World, an end-to-end world model that jointly reasons about future visual states and action sequences within a unified probabilistic framework. Our framework integrates a diffusion-based video generator with a vision-language policy, enabling synchronized rollouts where predicted scenes and planned actions are updated simultaneously. Training optimizes two complementary objectives: generating action-conditioned multi-step visual predictions and deriving trajectories conditioned on those predicted visuals. This bidirectional constraint makes visual predictions executable and keeps decisions grounded in physically consistent, task-relevant futures, mitigating cumulative errors common in decoupled "envision-then-plan"…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

amap-cvlab/AstraNav-World
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.