Multi-Agent Advisor Q-Learning
Sriram Ganapathi Subramanian, Matthew E. Taylor, Kate Larson, Mark, Crowley

TL;DR
This paper introduces a framework and two algorithms, ADMIRAL-DM and ADMIRAL-AE, for improving multi-agent reinforcement learning by incorporating advice from sub-optimal advisors, with theoretical guarantees and extensive empirical validation.
Contribution
It presents a novel principled approach and algorithms for integrating advisor recommendations into multi-agent Q-learning, addressing sample complexity and convergence issues.
Findings
Algorithms improve learning efficiency and stability.
Performance compares favorably to existing baselines.
Scales effectively to large state-action spaces.
Abstract
In the last decade, there have been significant advances in multi-agent reinforcement learning (MARL) but there are still numerous challenges, such as high sample complexity and slow convergence to stable policies, that need to be overcome before wide-spread deployment is possible. However, many real-world environments already, in practice, deploy sub-optimal or heuristic approaches for generating policies. An interesting question that arises is how to best use such approaches as advisors to help improve reinforcement learning in multi-agent domains. In this paper, we provide a principled framework for incorporating action recommendations from online sub-optimal advisors in multi-agent settings. We describe the problem of ADvising Multiple Intelligent Reinforcement Agents (ADMIRAL) in nonrestrictive general-sum stochastic game environments and present two novel Q-learning based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuction Theory and Applications · Reinforcement Learning in Robotics · Sports Analytics and Performance
MethodsQ-Learning
