A Model-Based Solution to the Offline Multi-Agent Reinforcement Learning Coordination Problem
Paul Barde, Jakob Foerster, Derek Nowrouzezahrai, Amy Zhang

TL;DR
This paper introduces MOMA-PPO, a novel model-based offline multi-agent reinforcement learning algorithm that effectively addresses coordination challenges by generating synthetic data, outperforming existing model-free methods in complex, real-world scenarios.
Contribution
The paper presents the first model-based approach to offline MARL, overcoming coordination issues and demonstrating superior performance over model-free algorithms in challenging environments.
Findings
MOMA-PPO outperforms model-free methods in coordination tasks
Model-based methods handle partial observability better
Synthetic interaction data improves convergence and policy fine-tuning
Abstract
Training multiple agents to coordinate is an essential problem with applications in robotics, game theory, economics, and social sciences. However, most existing Multi-Agent Reinforcement Learning (MARL) methods are online and thus impractical for real-world applications in which collecting new interactions is costly or dangerous. While these algorithms should leverage offline data when available, doing so gives rise to what we call the offline coordination problem. Specifically, we identify and formalize the strategy agreement (SA) and the strategy fine-tuning (SFT) coordination challenges, two issues at which current offline MARL algorithms fail. Concretely, we reveal that the prevalent model-free methods are severely deficient and cannot handle coordination-intensive offline multi-agent tasks in either toy or MuJoCo domains. To address this setback, we emphasize the importance of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Elevator Systems and Control
Methodsfail
