Solving Common-Payoff Games with Approximate Policy Iteration
Samuel Sokota, Edward Lockhart, Finbarr Timbers, Elnaz Davoodi, Ryan, D'Orazio, Neil Burch, Martin Schmid, Michael Bowling, Marc Lanctot

TL;DR
This paper introduces CAPI, a new algorithm that combines common knowledge with deep reinforcement learning to discover optimal joint policies in small common-payoff games, outperforming existing methods in accuracy.
Contribution
CAPI is a novel algorithm that prioritizes discovering optimal joint policies over scalability, improving upon prior methods like BAD in small game settings.
Findings
CAPI successfully finds optimal joint policies in small common-payoff games.
CAPI outperforms other multi-agent reinforcement learning algorithms in accuracy.
CAPI does not scale to large games like Hanabi, but excels in smaller settings.
Abstract
For artificially intelligent learning systems to have widespread applicability in real-world settings, it is important that they be able to operate decentrally. Unfortunately, decentralized control is difficult -- computing even an epsilon-optimal joint policy is a NEXP complete problem. Nevertheless, a recently rediscovered insight -- that a team of agents can coordinate via common knowledge -- has given rise to algorithms capable of finding optimal joint policies in small common-payoff games. The Bayesian action decoder (BAD) leverages this insight and deep reinforcement learning to scale to games as large as two-player Hanabi. However, the approximations it uses to do so prevent it from discovering optimal joint policies even in games small enough to brute force optimal solutions. This work proposes CAPI, a novel algorithm which, like BAD, combines common knowledge with deep…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning
