The Price of Paranoia: Robust Risk-Sensitive Cooperation in Non-Stationary Multi-Agent Reinforcement Learning

Deep Kumar Ganguly; Chandradithya S Jonnalagadda; Pratham Chintamani; Adithya Ananth

arXiv:2604.15695·cs.GT·April 20, 2026

The Price of Paranoia: Robust Risk-Sensitive Cooperation in Non-Stationary Multi-Agent Reinforcement Learning

Deep Kumar Ganguly, Chandradithya S Jonnalagadda, Pratham Chintamani, Adithya Ananth

PDF

TL;DR

This paper investigates the fragility of cooperative equilibria in non-stationary multi-agent reinforcement learning, revealing how standard risk-neutral learning destabilizes cooperation and proposing a robustness approach that improves stability.

Contribution

It introduces a new robustness method targeting policy gradient variance, expanding cooperation stability in non-stationary multi-agent environments.

Findings

01

Standard risk-neutral learning causes exponential instability of cooperation.

02

Risk-averse objectives worsen instability by penalizing cooperative actions.

03

The proposed method stabilizes cooperation by modulating gradient updates based on partner unpredictability.

Abstract

Cooperative equilibria are fragile. When agents learn alongside each other rather than in a fixed environment, the process of learning destabilizes the cooperation they are trying to sustain: every gradient step an agent takes shifts the distribution of actions its partner will play, turning a cooperative partner into a source of stochastic noise precisely where the cooperation decision is most sensitive. We study how this co-learning noise propagates through the structure of coordination games, and find that the cooperative equilibrium, even when strongly Pareto-dominant, is exponentially unstable under standard risk-neutral learning, collapsing irreversibly once partner noise crosses the game's critical cooperation threshold. The natural response to apply distributional robustness to hedge against partner uncertainty makes things strictly worse: risk-averse return objectives penalize…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.