Cross-Modal Navigation with Multi-Agent Reinforcement Learning

Shuo Liu; Xinzichen Li; Christopher Amato

arXiv:2605.06595·cs.RO·May 8, 2026

Cross-Modal Navigation with Multi-Agent Reinforcement Learning

Shuo Liu, Xinzichen Li, Christopher Amato

PDF

TL;DR

CRONA introduces a multi-agent reinforcement learning framework for cross-modal navigation, enhancing performance and scalability by leveraging specialized agents and centralized critics in complex embodied navigation tasks.

Contribution

The paper presents CRONA, a novel MARL framework that enables effective cross-modal collaboration among lightweight agents for embodied navigation.

Findings

01

Multi-agent methods outperform single-agent baselines in visual-acoustic navigation.

02

Homogeneous agents with limited modalities suffice for short-range navigation.

03

Heterogeneous agents with complementary modalities improve efficiency and effectiveness.

Abstract

Robust embodied navigation relies on complementary sensory cues. However, high-quality and well-aligned multi-modal data is often difficult to obtain in practice. Training a monolithic model is also challenging as rich multi-modal inputs induce complex representations and substantially enlarge the policy space. Cross-modal collaboration among lightweight modality-specialized agents offers a scalable paradigm. It enables flexible deployment and parallel execution, while preserving the strength of each modality. In this paper, we propose \textbf{CRONA}, a Multi-Agent Reinforcement Learning (MARL) framework for \textbf{Cro}ss-Modal \textbf{Na}vigation. CRONA improves collaboration by leveraging control-relevant auxiliary beliefs and a centralized multi-modal critic with global state. Experiments on visual-acoustic navigation tasks show that multi-agent methods significantly improve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.