HyPerNav: Hybrid Perception for Object-Oriented Navigation in Unknown Environment

Zecheng Yin; Hao Zhao; Zhen Li

arXiv:2510.22917·cs.RO·October 29, 2025

HyPerNav: Hybrid Perception for Object-Oriented Navigation in Unknown Environment

Zecheng Yin, Hao Zhao, Zhen Li

PDF

TL;DR

HyPerNav introduces a hybrid perception approach combining egocentric RGB-D data and top-down maps, leveraging vision-language models to improve object-oriented navigation in unknown environments, achieving state-of-the-art results.

Contribution

The paper presents a novel hybrid perception framework using vision-language models to integrate local and global perception for autonomous robot navigation.

Findings

01

Achieved state-of-the-art performance in simulation and real-world tests.

02

Hybrid perception captures richer cues for more effective object finding.

03

Both perception modalities contribute significantly to navigation success.

Abstract

Objective-oriented navigation(ObjNav) enables robot to navigate to target object directly and autonomously in an unknown environment. Effective perception in navigation in unknown environment is critical for autonomous robots. While egocentric observations from RGB-D sensors provide abundant local information, real-time top-down maps offer valuable global context for ObjNav. Nevertheless, the majority of existing studies focus on a single source, seldom integrating these two complementary perceptual modalities, despite the fact that humans naturally attend to both. With the rapid advancement of Vision-Language Models(VLMs), we propose Hybrid Perception Navigation (HyPerNav), leveraging VLMs' strong reasoning and vision-language understanding capabilities to jointly perceive both local and global information to enhance the effectiveness and intelligence of navigation in unknown…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.