How Exploration Breaks Cooperation in Shared-Policy Multi-Agent Reinforcement Learning

Yi-Ning Weng; Hsuan-Wei Lee

arXiv:2601.05509·cs.MA·January 12, 2026

How Exploration Breaks Cooperation in Shared-Policy Multi-Agent Reinforcement Learning

Yi-Ning Weng, Hsuan-Wei Lee

PDF

Open Access

TL;DR

This paper reveals that standard exploration in shared-policy multi-agent reinforcement learning can cause a collapse of cooperation due to representational failures, even in environments where cooperation is stable and payoff-dominant.

Contribution

It identifies a fundamental failure mode of shared-policy MARL caused by exploration-induced biases, and offers structural insights for designing better multi-agent learning systems.

Findings

01

Shared DQN converges to low-cooperation regimes.

02

Collapse persists across network sizes and exploration schedules.

03

Removing parameter sharing or maintaining independent representations prevents collapse.

Abstract

Multi-agent reinforcement learning in dynamic social dilemmas commonly relies on parameter sharing to enable scalability. We show that in shared-policy Deep Q-Network learning, standard exploration can induce a robust and systematic collapse of cooperation even in environments where fully cooperative equilibria are stable and payoff dominant. Through controlled experiments, we demonstrate that shared DQN converges to stable but persistently low-cooperation regimes. This collapse is not caused by reward misalignment, noise, or insufficient training, but by a representational failure arising from partial observability combined with parameter coupling across heterogeneous agent states. Exploration-driven updates bias the shared representation toward locally dominant defection responses, which then propagate across agents and suppress cooperative learning. We confirm that the failure…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Evolutionary Game Theory and Cooperation · Action Observation and Synchronization