From Pixels to Cooperation Multi Agent Reinforcement Learning based on Multimodal World Models
Sureyya Akin, Kavita Srivastava, Prateek B. Kapoor, Pradeep G. Sethi, Sunita Q. Patel, Rahu Srivastava

TL;DR
This paper introduces a multimodal world model that enables sample-efficient cooperative multi-agent reinforcement learning from high-dimensional sensory inputs like pixels and audio, by learning environment dynamics in a latent space.
Contribution
It presents a novel shared multimodal world model that fuses observations and acts as an imagined simulator for efficient policy training in multi-agent settings.
Findings
Achieves orders-of-magnitude better sample efficiency than baselines.
Multimodal fusion is crucial for task success in sensory-asymmetric environments.
Provides robustness to sensor dropout, aiding real-world deployment.
Abstract
Learning cooperative multi-agent policies directly from high-dimensional, multimodal sensory inputs like pixels and audio (from pixels) is notoriously sample-inefficient. Model-free Multi-Agent Reinforcement Learning (MARL) algorithms struggle with the joint challenge of representation learning, partial observability, and credit assignment. To address this, we propose a novel framework based on a shared, generative Multimodal World Model (MWM). Our MWM is trained to learn a compressed latent representation of the environment's dynamics by fusing distributed, multimodal observations from all agents using a scalable attention-based mechanism. Subsequently, we leverage this learned MWM as a fast, "imagined" simulator to train cooperative MARL policies (e.g., MAPPO) entirely within its latent space, decoupling representation learning from policy learning. We introduce a new set of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
