LightZeroNav: Zero-Shot Vision Language Navigation in Continuous Environments Based on Lightweight VLMs

Kun Luo; Xiangyu Dong; Xiaoguang Ma; Haoran Zhao; and Yaoming Zhou

arXiv:2603.16947·cs.CV·May 19, 2026

LightZeroNav: Zero-Shot Vision Language Navigation in Continuous Environments Based on Lightweight VLMs

Kun Luo, Xiangyu Dong, Xiaoguang Ma, Haoran Zhao, and Yaoming Zhou

PDF

TL;DR

LightZeroNav introduces a lightweight, zero-shot vision-language navigation method in continuous environments, overcoming key challenges with minimal resources and no task-specific training, achieving competitive results.

Contribution

The paper presents LightZeroNav, a novel approach that enables zero-shot VLN-CE using only RGB inputs and a lightweight VLM, addressing major bottlenecks without extensive training.

Findings

01

Achieves competitive performance with GPT-4o (~200B) in zero-shot VLN-CE.

02

Effectively handles information redundancy and noisy textual memory.

03

Operates without task-specific training, graph search, or waypoint predictors.

Abstract

Although vision-language navigation (VLN) has progressed rapidly, zero-shot VLN in continuous environments (VLN-CE) remains highly challenging when using lightweight vision-language models (VLMs), whose limited reasoning capacity makes long-horizon navigation unreliable. In this paper, we propose LightZeroNav to tackle the three major bottlenecks when using lightweight VLMs in zero-shot VLN-CE,i.e.,information redundancy from multi-source inputs, inaccurate progress estimation caused by noisy textual memory, and task entanglement between action execution and stage transition. Using only RGB observations and a lightweight open-source Qwen3-VL-8B backbone, LightZeroNav achieves competitive performance with GPT-4o (~200B) without task-specific training, graph search, or waypoint predictors, demonstrating its effectiveness in zero-shot VLN-CE.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Social Robot Interaction and HRI