$NavA^3$: Understanding Any Instruction, Navigating Anywhere, Finding Anything
Lingfeng Zhang, Xiaoshuai Hao, Yingbo Tang, Haoxiang Fu, Xinyu Zheng, Pengwei Wang, Zhongyuan Wang, Wenbo Ding, Shanghang Zhang

TL;DR
$NavA^3$ introduces a hierarchical framework for complex, open-ended embodied navigation that understands high-level instructions and localizes objects in real-world environments, outperforming existing methods.
Contribution
The paper presents a novel hierarchical navigation framework combining reasoning-based global policies with a large-scale spatial-aware object dataset and localization model, enabling advanced open-vocabulary navigation.
Findings
Achieves state-of-the-art navigation performance in real-world tasks.
Successfully completes long-horizon navigation across different robot embodiments.
Provides a new dataset and model for spatial-aware object localization.
Abstract
Embodied navigation is a fundamental capability of embodied intelligence, enabling robots to move and interact within physical environments. However, existing navigation tasks primarily focus on predefined object navigation or instruction following, which significantly differs from human needs in real-world scenarios involving complex, open-ended scenes. To bridge this gap, we introduce a challenging long-horizon navigation task that requires understanding high-level human instructions and performing spatial-aware object navigation in real-world environments. Existing embodied navigation methods struggle with such tasks due to their limitations in comprehending high-level human instructions and localizing objects with an open vocabulary. In this paper, we propose , a hierarchical framework divided into two stages: global and local policies. In the global policy, we leverage the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
