PIG-Nav: Key Insights for Pretrained Image Goal Navigation Models

Jiansong Wan; Chengming Zhou; Jinkua Liu; Xiangge Huang; Xiaoyu Chen; Xiaohan Yi; Qisen Yang; Baiting Zhu; Xin-Qiang Cai; Lixing Liu; Rushuai Yang; Chuheng Zhang; Sherif Abdelfattah; Hayong Shin; Pushi Zhang; Li Zhao; Jiang Bian

arXiv:2507.17220·cs.CV·July 24, 2025

PIG-Nav: Key Insights for Pretrained Image Goal Navigation Models

Jiansong Wan, Chengming Zhou, Jinkua Liu, Xiangge Huang, Xiaoyu Chen, Xiaohan Yi, Qisen Yang, Baiting Zhu, Xin-Qiang Cai, Lixing Liu, Rushuai Yang, Chuheng Zhang, Sherif Abdelfattah, Hayong Shin, Pushi Zhang, Li Zhao, Jiang Bian

PDF

Open Access 5 Models 3 Datasets

TL;DR

PIG-Nav introduces improved pretraining strategies for vision-based robotic navigation, utilizing early-fusion networks and auxiliary tasks, leading to significant performance gains in diverse environments with minimal fine-tuning.

Contribution

The paper presents novel pretraining techniques and a data preprocessing pipeline that enhance zero-shot and fine-tuned navigation performance of vision-based models.

Findings

01

22.6% average improvement in zero-shot performance

02

37.5% improvement with fine-tuning over existing models

03

Effective in simulated and real-world environments

Abstract

Recent studies have explored pretrained (foundation) models for vision-based robotic navigation, aiming to achieve generalizable navigation and positive transfer across diverse environments while enhancing zero-shot performance in unseen settings. In this work, we introduce PIG-Nav (Pretrained Image-Goal Navigation), a new approach that further investigates pretraining strategies for vision-based navigation models and contributes in two key areas. Model-wise, we identify two critical design choices that consistently improve the performance of pretrained navigation models: (1) integrating an early-fusion network structure to combine visual observations and goal images via appropriately pretrained Vision Transformer (ViT) image encoder, and (2) introducing suitable auxiliary tasks to enhance global navigation representation learning, thus further improving navigation performance.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInertial Sensor and Navigation · Robotics and Sensor-Based Localization · Satellite Image Processing and Photogrammetry