PanoNav: Mapless Zero-Shot Object Navigation with Panoramic Scene Parsing and Dynamic Memory
Qunchao Jin, Yilin Wu, Changhao Chen

TL;DR
PanoNav is a novel mapless zero-shot object navigation system that uses panoramic scene parsing and dynamic memory to improve spatial reasoning and decision-making in unseen environments, outperforming existing methods.
Contribution
It introduces a fully RGB-only framework combining panoramic scene parsing with a dynamic memory mechanism for better navigation in unseen environments.
Findings
Significantly outperforms baselines in SR and SPL metrics
Effective integration of panoramic scene parsing with memory-guided decision-making
Achieves robust navigation without depth sensors or prebuilt maps
Abstract
Zero-shot object navigation (ZSON) in unseen environments remains a challenging problem for household robots, requiring strong perceptual understanding and decision-making capabilities. While recent methods leverage metric maps and Large Language Models (LLMs), they often depend on depth sensors or prebuilt maps, limiting the spatial reasoning ability of Multimodal Large Language Models (MLLMs). Mapless ZSON approaches have emerged to address this, but they typically make short-sighted decisions, leading to local deadlocks due to a lack of historical context. We propose PanoNav, a fully RGB-only, mapless ZSON framework that integrates a Panoramic Scene Parsing module to unlock the spatial parsing potential of MLLMs from panoramic RGB inputs, and a Memory-guided Decision-Making mechanism enhanced by a Dynamic Bounded Memory Queue to incorporate exploration history and avoid local…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Robotics and Sensor-Based Localization · Robotic Path Planning Algorithms
