How Exploration Breaks Cooperation in Shared-Policy Multi-Agent Reinforcement Learning
Yi-Ning Weng, Hsuan-Wei Lee

TL;DR
This paper reveals that standard exploration in shared-policy multi-agent reinforcement learning can cause a collapse of cooperation due to representational failures, even in environments where cooperation is stable and payoff-dominant.
Contribution
It identifies a fundamental failure mode of shared-policy MARL caused by exploration-induced biases, and offers structural insights for designing better multi-agent learning systems.
Findings
Shared DQN converges to low-cooperation regimes.
Collapse persists across network sizes and exploration schedules.
Removing parameter sharing or maintaining independent representations prevents collapse.
Abstract
Multi-agent reinforcement learning in dynamic social dilemmas commonly relies on parameter sharing to enable scalability. We show that in shared-policy Deep Q-Network learning, standard exploration can induce a robust and systematic collapse of cooperation even in environments where fully cooperative equilibria are stable and payoff dominant. Through controlled experiments, we demonstrate that shared DQN converges to stable but persistently low-cooperation regimes. This collapse is not caused by reward misalignment, noise, or insufficient training, but by a representational failure arising from partial observability combined with parameter coupling across heterogeneous agent states. Exploration-driven updates bias the shared representation toward locally dominant defection responses, which then propagate across agents and suppress cooperative learning. We confirm that the failure…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Evolutionary Game Theory and Cooperation · Action Observation and Synchronization
