HyPerNav: Hybrid Perception for Object-Oriented Navigation in Unknown Environment
Zecheng Yin, Hao Zhao, Zhen Li

TL;DR
HyPerNav introduces a hybrid perception approach combining egocentric RGB-D data and top-down maps, leveraging vision-language models to improve object-oriented navigation in unknown environments, achieving state-of-the-art results.
Contribution
The paper presents a novel hybrid perception framework using vision-language models to integrate local and global perception for autonomous robot navigation.
Findings
Achieved state-of-the-art performance in simulation and real-world tests.
Hybrid perception captures richer cues for more effective object finding.
Both perception modalities contribute significantly to navigation success.
Abstract
Objective-oriented navigation(ObjNav) enables robot to navigate to target object directly and autonomously in an unknown environment. Effective perception in navigation in unknown environment is critical for autonomous robots. While egocentric observations from RGB-D sensors provide abundant local information, real-time top-down maps offer valuable global context for ObjNav. Nevertheless, the majority of existing studies focus on a single source, seldom integrating these two complementary perceptual modalities, despite the fact that humans naturally attend to both. With the rapid advancement of Vision-Language Models(VLMs), we propose Hybrid Perception Navigation (HyPerNav), leveraging VLMs' strong reasoning and vision-language understanding capabilities to jointly perceive both local and global information to enhance the effectiveness and intelligence of navigation in unknown…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
