Scalable Multi-Agent Offline Reinforcement Learning and the Role of Information
Riccardo Zamboni, Enrico Brunetti, Marcello Restelli

TL;DR
This paper introduces SCAM-FQI, a scalable multi-agent offline RL method that balances dataset collection and policy learning through structured information sharing, ensuring convergence to near-optimal policies.
Contribution
It proposes a novel scalable routine for dataset collection and offline learning in multi-agent RL, with theoretical convergence guarantees and bounds based on shared information.
Findings
SCAM-FQI converges to near-optimal policies with high probability.
The approach balances scalability and performance in multi-agent offline RL.
Empirical results support theoretical convergence and effectiveness.
Abstract
Offline Reinforcement Learning (RL) focuses on learning policies solely from a batch of previously collected data. offering the potential to leverage such datasets effectively without the need for costly or risky active exploration. While recent advances in Offline Multi-Agent RL (MARL) have shown promise, most existing methods either rely on large datasets jointly collected by all agents or agent-specific datasets collected independently. The former approach ensures strong performance but raises scalability concerns, while the latter emphasizes scalability at the expense of performance guarantees. In this work, we propose a novel scalable routine for both dataset collection and offline learning. Agents first collect diverse datasets coherently with a pre-specified information-sharing network and subsequently learn coherent localized policies without requiring either full observability…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
MethodsALIGN
