NavigateDiff: Visual Predictors are Zero-Shot Navigation Assistants

Yiran Qin; Ao Sun; Yuze Hong; Benyou Wang; Ruimao Zhang

arXiv:2502.13894·cs.RO·February 20, 2025

NavigateDiff: Visual Predictors are Zero-Shot Navigation Assistants

Yiran Qin, Ao Sun, Yuze Hong, Benyou Wang, Ruimao Zhang

PDF

Open Access

TL;DR

NavigateDiff leverages vision-language models and diffusion networks to enable zero-shot visual navigation, improving robot adaptability and efficiency in unfamiliar environments without extensive retraining.

Contribution

The paper introduces NavigateDiff, a novel approach that combines large vision-language models with diffusion networks to predict future observations and guide zero-shot navigation.

Findings

01

Enhanced navigation robustness in diverse environments

02

Effective generalization to unseen scenes

03

Improved efficiency over traditional RL methods

Abstract

Navigating unfamiliar environments presents significant challenges for household robots, requiring the ability to recognize and reason about novel decoration and layout. Existing reinforcement learning methods cannot be directly transferred to new environments, as they typically rely on extensive mapping and exploration, leading to time-consuming and inefficient. To address these challenges, we try to transfer the logical knowledge and the generalization ability of pre-trained foundation models to zero-shot navigation. By integrating a large vision-language model with a diffusion network, our approach named \mname ~constructs a visual predictor that continuously predicts the agent's potential observations in the next step which can assist robots generate robust actions. Furthermore, to adapt the temporal property of navigation, we introduce temporal historical information to ensure that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications

MethodsDiffusion