MA-CoNav: A Master-Slave Multi-Agent Framework with Hierarchical Collaboration and Dual-Level Reflection for Long-Horizon Embodied VLN

Ling Luo; Qianqian Bai

arXiv:2603.03024·cs.RO·March 4, 2026

MA-CoNav: A Master-Slave Multi-Agent Framework with Hierarchical Collaboration and Dual-Level Reflection for Long-Horizon Embodied VLN

Ling Luo, Qianqian Bai

PDF

Open Access

TL;DR

MA-CoNav introduces a hierarchical multi-agent framework with dual-level reflection for improved long-horizon vision-language navigation, effectively distributing perception, planning, and memory functions to enhance performance in complex environments.

Contribution

The paper proposes a novel Master-Slave multi-agent architecture with a dual-stage reflection mechanism, enabling better distribution of tasks and dynamic optimization in VLN tasks.

Findings

01

Outperforms existing VLN methods on real-world indoor datasets

02

No scene-specific fine-tuning required for the models

03

Demonstrates significant improvements across multiple evaluation metrics

Abstract

Vision-Language Navigation (VLN) aims to empower robots with the ability to perform long-horizon navigation in unfamiliar environments based on complex linguistic instructions. Its success critically hinges on establishing an efficient ``language-understanding -- visual-perception -- embodied-execution'' closed loop. Existing methods often suffer from perceptual distortion and decision drift in complex, long-distance tasks due to the cognitive overload of a single agent. Inspired by distributed cognition theory, this paper proposes MA-CoNav, a Multi-Agent Collaborative Navigation framework. This framework adopts a ``Master-Slave'' hierarchical agent collaboration architecture, decoupling and distributing the perception, planning, execution, and memory functions required for navigation tasks to specialized agents. Specifically, the Master Agent is responsible for global orchestration,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robotics and Sensor-Based Localization · Robotic Path Planning Algorithms