MWM: Mobile World Models for Action-Conditioned Consistent Prediction

Han Yan; Zishang Xiang; Zeyu Zhang; Hao Tang

arXiv:2603.07799·cs.CV·March 10, 2026

MWM: Mobile World Models for Action-Conditioned Consistent Prediction

Han Yan, Zishang Xiang, Zeyu Zhang, Hao Tang

PDF

Open Access

TL;DR

MWM introduces a novel training framework for action-conditioned world models that enhances rollout consistency and inference efficiency, significantly improving visual fidelity and navigation success in embodied planning tasks.

Contribution

The paper presents a two-stage training approach with structure pretraining and ACC post-training, along with ICSD for diffusion distillation, addressing rollout inconsistency and inference efficiency in navigation models.

Findings

01

Improved visual fidelity and trajectory accuracy in benchmarks.

02

Enhanced planning success rates in real-world navigation tasks.

03

Increased inference efficiency with few-step diffusion distillation.

Abstract

World models enable planning in imagined future predicted space, offering a promising framework for embodied navigation. However, existing navigation world models often lack action-conditioned consistency, so visually plausible predictions can still drift under multi-step rollout and degrade planning. Moreover, efficient deployment requires few-step diffusion inference, but existing distillation methods do not explicitly preserve rollout consistency, creating a training-inference mismatch. To address these challenges, we propose MWM, a mobile world model for planning-based image-goal navigation. Specifically, we introduce a two-stage training framework that combines structure pretraining with Action-Conditioned Consistency (ACC) post-training to improve action-conditioned rollout consistency. We further introduce Inference-Consistent State Distillation (ICSD) for few-step diffusion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Autonomous Vehicle Technology and Safety · Reinforcement Learning in Robotics