Intrinsic fluctuations of reinforcement learning promote cooperation
Wolfram Barfuss, Janusz Meylahn

TL;DR
This paper shows that intrinsic stochastic fluctuations in reinforcement learning processes significantly promote cooperation in social dilemmas, doubling cooperation rates and highlighting noise as a beneficial factor.
Contribution
It reveals that intrinsic noise in reinforcement learning, alongside other factors, critically enhances cooperation in multi-agent settings, a novel insight for designing cooperative algorithms.
Findings
Intrinsic fluctuations double cooperation rates to up to 80%.
Low exploration and small learning rates also promote cooperation.
Noise is a beneficial asset, not just a source of error.
Abstract
In this work, we ask for and answer what makes classical temporal-difference reinforcement learning with epsilon-greedy strategies cooperative. Cooperating in social dilemma situations is vital for animals, humans, and machines. While evolutionary theory revealed a range of mechanisms promoting cooperation, the conditions under which agents learn to cooperate are contested. Here, we demonstrate which and how individual elements of the multi-agent learning setting lead to cooperation. We use the iterated Prisoner's dilemma with one-period memory as a testbed. Each of the two learning agents learns a strategy that conditions the following action choices on both agents' action choices of the last round. We find that next to a high caring for future rewards, a low exploration rate, and a small learning rate, it is primarily intrinsic stochastic fluctuations of the reinforcement learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEvolutionary Game Theory and Cooperation
