Symmetric equilibrium of multi-agent reinforcement learning in repeated   prisoner's dilemma

Yuki Usui; Masahiko Ueda

arXiv:2101.11861·cs.GT·June 2, 2021

Symmetric equilibrium of multi-agent reinforcement learning in repeated prisoner's dilemma

Yuki Usui, Masahiko Ueda

PDF

TL;DR

This paper analyzes the symmetric equilibrium outcomes in multi-agent reinforcement learning within the repeated prisoner's dilemma, revealing specific strategies that form stable equilibria.

Contribution

It provides a theoretical solution to the Bellman equations in this context and identifies strategies that constitute symmetric equilibria.

Findings

01

Win-stay Lose-shift, Grim, and always defect strategies form symmetric equilibria.

02

Theoretical characterization of equilibrium strategies in multi-agent reinforcement learning.

03

Analytical solution to Bellman optimality equations for the repeated prisoner's dilemma.

Abstract

We investigate the repeated prisoner's dilemma game where both players alternately use reinforcement learning to obtain their optimal memory-one strategies. We theoretically solve the simultaneous Bellman optimality equations of reinforcement learning. We find that the Win-stay Lose-shift strategy, the Grim strategy, and the strategy which always defects can form symmetric equilibrium of the mutual reinforcement learning process amongst all deterministic memory-one strategies.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.