Object Navigation with Structure-Semantic Reasoning-Based Multi-level Map and Multimodal Decision-Making LLM
Chongshang Yan, Jiaxuan He, Delun Li, Yi Yang, Wenjie Song

TL;DR
This paper introduces a novel hierarchical reasoning framework with a structured map and multimodal decision-making for zero-shot object navigation, significantly enhancing success rates and efficiency in unknown environments.
Contribution
It proposes an Environmental Attributes Map and MLLM Hierarchical Reasoning module that leverage scene regularities and multimodal reasoning to improve object navigation performance.
Findings
EAM achieves 64.5% scene mapping accuracy on MP3D.
Navigation success rates improve with SPLs of 28.4% and 26.3%.
Significant performance gains over baseline methods.
Abstract
The zero-shot object navigation (ZSON) in unknown open-ended environments coupled with semantically novel target often suffers from the significant decline in performance due to the neglect of high-dimensional implicit scene information and the long-range target searching task. To address this, we proposed an active object navigation framework with Environmental Attributes Map (EAM) and MLLM Hierarchical Reasoning module (MHR) to improve its success rate and efficiency. EAM is constructed by reasoning observed environments with SBERT and predicting unobserved ones with Diffusion, utilizing human space regularities that underlie object-room correlations and area adjacencies. MHR is inspired by EAM to perform frontier exploration decision-making, avoiding the circuitous trajectories in long-range scenarios to improve path efficiency. Experimental results demonstrate that the EAM module…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Robotic Path Planning Algorithms
MethodsDiffusion · Sentence-BERT
