ReasonNavi: Human-Inspired Global Map Reasoning for Zero-Shot Embodied Navigation
Yuzhuo Ao, Anbang Wang, Yu-Wing Tai, Chi-Keung Tang

TL;DR
ReasonNavi introduces a human-inspired, zero-shot embodied navigation framework that combines multimodal language models with deterministic planning to improve global reasoning and efficiency without extensive training.
Contribution
It presents a novel framework that leverages large language models and deterministic planning for global map reasoning in embodied navigation, eliminating the need for fine-tuning.
Findings
Outperforms prior methods on three navigation tasks.
Does not require MLLM fine-tuning or extensive scene modeling.
Provides a scalable and interpretable navigation solution.
Abstract
Embodied agents often struggle with efficient navigation because they rely primarily on partial egocentric observations, which restrict global foresight and lead to inefficient exploration. In contrast, humans plan using maps: we reason globally first, then act locally. We introduce ReasonNavi, a human-inspired framework that operationalizes this reason-then-act paradigm by coupling Multimodal Large Language Models (MLLMs) with deterministic planners. ReasonNavi converts a top-down map into a discrete reasoning space by room segmentation and candidate target nodes sampling. An MLLM is then queried in a multi-stage process to identify the candidate most consistent with the instruction (object, image, or text goal), effectively leveraging the model's semantic reasoning ability while sidestepping its weakness in continuous coordinate prediction. The selected waypoint is grounded into…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Social Robot Interaction and HRI · Action Observation and Synchronization
