Scalable Multi-Agent Offline Reinforcement Learning and the Role of Information

Riccardo Zamboni; Enrico Brunetti; Marcello Restelli

arXiv:2502.11260·cs.LG·June 6, 2025

Scalable Multi-Agent Offline Reinforcement Learning and the Role of Information

Riccardo Zamboni, Enrico Brunetti, Marcello Restelli

PDF

Open Access

TL;DR

This paper introduces SCAM-FQI, a scalable multi-agent offline RL method that balances dataset collection and policy learning through structured information sharing, ensuring convergence to near-optimal policies.

Contribution

It proposes a novel scalable routine for dataset collection and offline learning in multi-agent RL, with theoretical convergence guarantees and bounds based on shared information.

Findings

01

SCAM-FQI converges to near-optimal policies with high probability.

02

The approach balances scalability and performance in multi-agent offline RL.

03

Empirical results support theoretical convergence and effectiveness.

Abstract

Offline Reinforcement Learning (RL) focuses on learning policies solely from a batch of previously collected data. offering the potential to leverage such datasets effectively without the need for costly or risky active exploration. While recent advances in Offline Multi-Agent RL (MARL) have shown promise, most existing methods either rely on large datasets jointly collected by all agents or agent-specific datasets collected independently. The former approach ensures strong performance but raises scalability concerns, while the latter emphasizes scalability at the expense of performance guarantees. In this work, we propose a novel scalable routine for both dataset collection and offline learning. Agents first collect diverse datasets coherently with a pre-specified information-sharing network and subsequently learn coherent localized policies without requiring either full observability…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsALIGN