# CA-VLN: Collaborative Agents in MLLM-Powered Visual-Language Navigation

**Authors:** Ruolin Zhu, Shaobin Li, Zixing Zhu, Jing Jia, Min Yang

PMC · DOI: 10.3390/s26041254 · 2026-02-14

## TL;DR

This paper introduces a new framework for visual-language navigation using collaborative agents powered by multimodal large language models to improve generalization in unseen environments.

## Contribution

The novel dual-agent framework combines semantic reasoning and episodic memory for enhanced navigation generalization.

## Key findings

- The proposed CA-VLN framework achieves state-of-the-art performance on R2R, REVERIE, and SOON datasets.
- The model significantly improves generalization and navigation success in previously unobserved environments.

## Abstract

Generalization to unseen environments remains a fundamental challenge in Vision-Language Navigation. To tackle this issue, we propose a novel framework that leverages world knowledge embedded within Multimodal Large Language Models. We introduce Collaborative Agents in Visual-Language Navigation (CA-VLN), a framework based on a dual-agent architecture. This architecture comprises a Knowledge Agent, which infuses the action prediction process with semantic context and commonsense reasoning, and a Hierarchical History Agent, which constructs a detailed episodic memory to enable long-horizon planning. The collaboration between these agents facilitates a dynamic interplay between high-level semantic understanding and grounded episodic experience. Extensive experiments on the R2R, REVERIE and SOON datasets demonstrate that our model achieves state-of-the-art performance, significantly improving generalization and navigation success in previously unobserved environments.

## Full-text entities

- **Diseases:** injury to (MESH:D014947), hallucination (MESH:D006212)
- **Chemicals:** CA (MESH:D002118), water (MESH:D014867), CA-VLN (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12944077/full.md

---
Source: https://tomesphere.com/paper/PMC12944077