RAE-NWM: Navigation World Model in Dense Visual Representation Space

Mingkun Zhang; Wangtian Shen; Fan Zhang; Haijian Qin; Zihao Pei; and Ziyang Meng

arXiv:2603.09241·cs.CV·March 11, 2026

RAE-NWM: Navigation World Model in Dense Visual Representation Space

Mingkun Zhang, Wangtian Shen, Fan Zhang, Haijian Qin, Zihao Pei, and Ziyang Meng

PDF

Open Access

TL;DR

This paper introduces RAE-NWM, a novel navigation world model that operates in a dense visual feature space, improving structural stability and control in visual navigation tasks.

Contribution

The paper proposes RAE-NWM, a new model that leverages dense visual features and a diffusion transformer to enhance navigation accuracy and stability.

Findings

01

Dense DINOv2 features have strong linear predictability for transitions.

02

Modeling in dense feature space improves structural stability.

03

Enhanced navigation performance in complex environments.

Abstract

Visual navigation requires agents to reach goals in complex environments through perception and planning. World models address this task by simulating action-conditioned state transitions to predict future observations. Current navigation world models typically learn state evolution under actions within the compressed latent space of a Variational Autoencoder, where spatial compression often discards fine-grained structural information and hinders precise control. To better understand the propagation characteristics of different representations, we conduct a linear dynamics probe and observe that dense DINOv2 features exhibit stronger linear predictability for action-conditioned transitions. Motivated by this observation, we propose the Representation Autoencoder-based Navigation World Model (RAE-NWM), which models navigation dynamics in a dense visual representation space. We employ a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Autonomous Vehicle Technology and Safety