MerNav: A Highly Generalizable Memory-Execute-Review Framework for Zero-Shot Object Goal Navigation

Dekang Qi; Shuang Zeng; Xinyuan Chang; Feng Xiong; Shichao Xie; Xiaolong Wu; Mu Xu

arXiv:2602.05467·cs.CV·April 14, 2026

MerNav: A Highly Generalizable Memory-Execute-Review Framework for Zero-Shot Object Goal Navigation

Dekang Qi, Shuang Zeng, Xinyuan Chang, Feng Xiong, Shichao Xie, Xiaolong Wu, Mu Xu

PDF

1 Repo

TL;DR

MerNav introduces a novel Memory-Execute-Review framework that significantly improves zero-shot object goal navigation success rates and generalization across multiple datasets, validated both in simulation and real-world robot deployment.

Contribution

It presents a highly generalizable framework combining memory, execution, and review modules, outperforming existing methods in success rate and robustness for VLN tasks.

Findings

01

Achieved 7-8% higher success rate than baselines across datasets.

02

Outperformed all supervised fine-tuning methods in success and generalization.

03

Successfully deployed on a humanoid robot in real-world environments.

Abstract

Visual Language Navigation (VLN) is one of the fundamental capabilities for embodied intelligence and a critical challenge that urgently needs to be addressed. However, existing methods are still unsatisfactory in terms of both success rate (SR) and generalization: Supervised Fine-Tuning (SFT) approaches typically achieve higher SR, while Training-Free (TF) approaches often generalize better, but it is difficult to obtain both simultaneously. To this end, we propose a Memory-Execute-Review framework. It consists of three parts: a hierarchical memory module for providing information support, an execute module for routine decision-making and actions, and a review module for handling abnormal situations and correcting behavior. We validated the effectiveness of this framework on the Object Goal Navigation task. Across 4 datasets, our average SR achieved absolute improvements of 7% and 5%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://qidekang.github.io/MerNav.github.io
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.