Think before Go: Hierarchical Reasoning for Image-goal Navigation

Pengna Li; Kangyi Wu; Shaoqing Xu; Fang Li; Lin Zhao; Long Chen; Zhi-Xin Yang; Nanning Zheng

arXiv:2604.17407·cs.RO·April 21, 2026

Think before Go: Hierarchical Reasoning for Image-goal Navigation

Pengna Li, Kangyi Wu, Shaoqing Xu, Fang Li, Lin Zhao, Long Chen, Zhi-Xin Yang, Nanning Zheng

PDF

TL;DR

The paper introduces HRNav, a hierarchical framework for image-goal navigation that combines high-level planning with low-level reinforcement learning to improve navigation efficiency and reduce wandering.

Contribution

It proposes a novel hierarchical reasoning approach that decomposes the task into planning and execution, enhancing performance over existing end-to-end methods.

Findings

01

HRNav outperforms existing methods in simulation and real-world tests.

02

The hierarchical approach reduces wandering and improves goal-reaching success.

03

The Wandering Suppression Penalty effectively minimizes unnecessary exploration.

Abstract

Image-goal navigation steers an agent to a target location specified by an image in unseen environments. Existing methods primarily handle this task by learning an end-to-end navigation policy, which compares the similarities of target and observation images and directly predicts the actions. However, when the target is distant or lies in another room, such methods fail to extract informative visual cues, leading the agent to wander around. Motivated by the human cognitive principle that deliberate, high-level reasoning guides fast, reactive execution in complex tasks, we propose Hierarchical Reasoning Navigation (HRNav), a framework that decomposes image-goal navigation into high-level planning and low-level execution. In high-level planning, a vision-language model is trained on a self-collected dataset to generate a short-horizon plan, such as whether the agent should walk through…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.