Solving Common-Payoff Games with Approximate Policy Iteration

Samuel Sokota; Edward Lockhart; Finbarr Timbers; Elnaz Davoodi; Ryan; D'Orazio; Neil Burch; Martin Schmid; Michael Bowling; Marc Lanctot

arXiv:2101.04237·cs.AI·January 13, 2021

Solving Common-Payoff Games with Approximate Policy Iteration

Samuel Sokota, Edward Lockhart, Finbarr Timbers, Elnaz Davoodi, Ryan, D'Orazio, Neil Burch, Martin Schmid, Michael Bowling, Marc Lanctot

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper introduces CAPI, a new algorithm that combines common knowledge with deep reinforcement learning to discover optimal joint policies in small common-payoff games, outperforming existing methods in accuracy.

Contribution

CAPI is a novel algorithm that prioritizes discovering optimal joint policies over scalability, improving upon prior methods like BAD in small game settings.

Findings

01

CAPI successfully finds optimal joint policies in small common-payoff games.

02

CAPI outperforms other multi-agent reinforcement learning algorithms in accuracy.

03

CAPI does not scale to large games like Hanabi, but excels in smaller settings.

Abstract

For artificially intelligent learning systems to have widespread applicability in real-world settings, it is important that they be able to operate decentrally. Unfortunately, decentralized control is difficult -- computing even an epsilon-optimal joint policy is a NEXP complete problem. Nevertheless, a recently rediscovered insight -- that a team of agents can coordinate via common knowledge -- has given rise to algorithms capable of finding optimal joint policies in small common-payoff games. The Bayesian action decoder (BAD) leverages this insight and deep reinforcement learning to scale to games as large as two-player Hanabi. However, the approximations it uses to do so prevent it from discovering optimal joint policies even in games small enough to brute force optimal solutions. This work proposes CAPI, a novel algorithm which, like BAD, combines common knowledge with deep…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Solving Common-Payoff Games with Approximate Policy Iteration· underline

Taxonomy

TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning