Cross-Modal Navigation with Multi-Agent Reinforcement Learning
Shuo Liu, Xinzichen Li, Christopher Amato

TL;DR
CRONA introduces a multi-agent reinforcement learning framework for cross-modal navigation, enhancing performance and scalability by leveraging specialized agents and centralized critics in complex embodied navigation tasks.
Contribution
The paper presents CRONA, a novel MARL framework that enables effective cross-modal collaboration among lightweight agents for embodied navigation.
Findings
Multi-agent methods outperform single-agent baselines in visual-acoustic navigation.
Homogeneous agents with limited modalities suffice for short-range navigation.
Heterogeneous agents with complementary modalities improve efficiency and effectiveness.
Abstract
Robust embodied navigation relies on complementary sensory cues. However, high-quality and well-aligned multi-modal data is often difficult to obtain in practice. Training a monolithic model is also challenging as rich multi-modal inputs induce complex representations and substantially enlarge the policy space. Cross-modal collaboration among lightweight modality-specialized agents offers a scalable paradigm. It enables flexible deployment and parallel execution, while preserving the strength of each modality. In this paper, we propose \textbf{CRONA}, a Multi-Agent Reinforcement Learning (MARL) framework for \textbf{Cro}ss-Modal \textbf{Na}vigation. CRONA improves collaboration by leveraging control-relevant auxiliary beliefs and a centralized multi-modal critic with global state. Experiments on visual-acoustic navigation tasks show that multi-agent methods significantly improve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
