Efficient Image-Goal Navigation with Representative Latent World Model

Zhiwei Zhang; Hui Zhang; Kaihong Huang; Chenghao Shi; Huimin Lu

arXiv:2511.11011·cs.RO·December 22, 2025

Efficient Image-Goal Navigation with Representative Latent World Model

Zhiwei Zhang, Hui Zhang, Kaihong Huang, Chenghao Shi, Huimin Lu

PDF

Open Access

TL;DR

This paper introduces ReL-NWM, a novel latent space world model for efficient image-goal navigation that bypasses pixel-level reconstruction, enabling fast planning and successful real-world deployment.

Contribution

The paper presents ReL-NWM, a high-level semantic latent space model that improves navigation efficiency and performance over traditional pixel-based world models.

Findings

01

Achieves state-of-the-art trajectory prediction accuracy.

02

Demonstrates effective image-goal navigation in benchmarks.

03

Successfully deployed on a real humanoid robot.

Abstract

World models enable robots to conduct counterfactual reasoning in physical environments by predicting future world states. While conventional approaches often prioritize pixel-level reconstruction of future scenes, such detailed rendering is computationally intensive and unnecessary for planning tasks like navigation. We therefore propose that prediction and planning can be efficiently performed directly within a latent space of high-level semantic representations. To realize this, we introduce the Representative Latent space Navigation World Model (ReL-NWM). Rather than relying on reconstructionoriented latent embeddings, our method leverages a pre-trained representation encoder, DINOv3, and incorporates specialized mechanisms to effectively integrate action signals and historical context within this representation space. By operating entirely in the latent domain, our model bypasses…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Social Robot Interaction and HRI