Emergence of Cooperation in Two-agent Repeated Games with Reinforcement Learning
Zhen-Wei Ding, Guo-Zhong Zheng, Chao-Ran Cai, Wei-Ran Cai, Li Chen,, Ji-Qiang Zhang, Xu-Ming Wang

TL;DR
This paper investigates how cooperation emerges and stabilizes in a two-agent system playing the prisoner's dilemma using reinforcement learning, highlighting the roles of memory, expectations, and exploration.
Contribution
It reveals the conditions under which coordinated optimal policies emerge and remain stable, emphasizing the importance of memory and future expectations in fostering cooperation.
Findings
Strong memory and long-term expectations promote cooperation.
Tolerance to defection can lead to cooperation collapse.
Weaker memory and lower expectations favor defection dominance.
Abstract
Cooperation is the foundation of ecosystems and the human society, and the reinforcement learning provides crucial insight into the mechanism for its emergence. However, most previous work has mostly focused on the self-organization at the population level, the fundamental dynamics at the individual level remains unclear. Here, we investigate the evolution of cooperation in a two-agent system, where each agent pursues optimal policies according to the classical Q-learning algorithm in playing the strict prisoner's dilemma. We reveal that a strong memory and long-sighted expectation yield the emergence of Coordinated Optimal Policies (COPs), where both agents act like Win-Stay, Lose-Shift (WSLS) to maintain a high level of cooperation. Otherwise, players become tolerant toward their co-player's defection and the cooperation loses stability in the end where the policy all Defection…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEvolutionary Game Theory and Cooperation · Complex Systems and Time Series Analysis · Ecosystem dynamics and resilience
