Symmetric equilibrium of multi-agent reinforcement learning in repeated prisoner's dilemma
Yuki Usui, Masahiko Ueda

TL;DR
This paper analyzes the symmetric equilibrium outcomes in multi-agent reinforcement learning within the repeated prisoner's dilemma, revealing specific strategies that form stable equilibria.
Contribution
It provides a theoretical solution to the Bellman equations in this context and identifies strategies that constitute symmetric equilibria.
Findings
Win-stay Lose-shift, Grim, and always defect strategies form symmetric equilibria.
Theoretical characterization of equilibrium strategies in multi-agent reinforcement learning.
Analytical solution to Bellman optimality equations for the repeated prisoner's dilemma.
Abstract
We investigate the repeated prisoner's dilemma game where both players alternately use reinforcement learning to obtain their optimal memory-one strategies. We theoretically solve the simultaneous Bellman optimality equations of reinforcement learning. We find that the Win-stay Lose-shift strategy, the Grim strategy, and the strategy which always defects can form symmetric equilibrium of the mutual reinforcement learning process amongst all deterministic memory-one strategies.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
