TL;DR
MerNav introduces a novel Memory-Execute-Review framework that significantly improves zero-shot object goal navigation success rates and generalization across multiple datasets, validated both in simulation and real-world robot deployment.
Contribution
It presents a highly generalizable framework combining memory, execution, and review modules, outperforming existing methods in success rate and robustness for VLN tasks.
Findings
Achieved 7-8% higher success rate than baselines across datasets.
Outperformed all supervised fine-tuning methods in success and generalization.
Successfully deployed on a humanoid robot in real-world environments.
Abstract
Visual Language Navigation (VLN) is one of the fundamental capabilities for embodied intelligence and a critical challenge that urgently needs to be addressed. However, existing methods are still unsatisfactory in terms of both success rate (SR) and generalization: Supervised Fine-Tuning (SFT) approaches typically achieve higher SR, while Training-Free (TF) approaches often generalize better, but it is difficult to obtain both simultaneously. To this end, we propose a Memory-Execute-Review framework. It consists of three parts: a hierarchical memory module for providing information support, an execute module for routine decision-making and actions, and a review module for handling abnormal situations and correcting behavior. We validated the effectiveness of this framework on the Object Goal Navigation task. Across 4 datasets, our average SR achieved absolute improvements of 7% and 5%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
