Towards Principled Unsupervised Multi-Agent Reinforcement Learning
Riccardo Zamboni, Mirco Mutti, Marcello Restelli

TL;DR
This paper explores unsupervised pre-training in multi-agent reinforcement learning, analyzing theoretical challenges and proposing a scalable decentralized algorithm that balances tractability and performance.
Contribution
It characterizes alternative formulations of unsupervised multi-agent RL, introduces a practical trust-region algorithm, and demonstrates the effectiveness of mixture entropy optimization.
Findings
Theoretical analysis highlights the complexity of unsupervised multi-agent RL.
A scalable decentralized algorithm is proposed for practical applications.
Mixture entropy optimization offers a good trade-off between tractability and performance.
Abstract
In reinforcement learning, we typically refer to unsupervised pre-training when we aim to pre-train a policy without a priori access to the task specification, i.e. rewards, to be later employed for efficient learning of downstream tasks. In single-agent settings, the problem has been extensively studied and mostly understood. A popular approach, called task-agnostic exploration, casts the unsupervised objective as maximizing the entropy of the state distribution induced by the agent's policy, from which principles and methods follow. In contrast, little is known about it in multi-agent settings, which are ubiquitous in the real world. What are the pros and cons of alternative problem formulations in this setting? How hard is the problem in theory, how can we solve it in practice? In this paper, we address these questions by first characterizing those alternative formulations and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Fault Detection and Control Systems
