Provably Convergent Actor-Critic in Risk-averse MARL

Yizhou Zhang; Eric Mazumdar

arXiv:2602.12386·cs.MA·February 16, 2026

Provably Convergent Actor-Critic in Risk-averse MARL

Yizhou Zhang, Eric Mazumdar

PDF

Open Access

TL;DR

This paper introduces a provably convergent Actor-Critic algorithm for risk-averse multi-agent reinforcement learning in Markov games, leveraging risk-averse equilibria to ensure convergence and practical applicability.

Contribution

It develops a novel two-timescale Actor-Critic method that converges globally in risk-averse Markov games, addressing a key challenge in multi-agent RL.

Findings

01

Proves global convergence with finite-sample guarantees.

02

Demonstrates superior convergence in empirical environments.

03

Validates effectiveness of risk-averse equilibria in MARL.

Abstract

Learning stationary policies in infinite-horizon general-sum Markov games (MGs) remains a fundamental open problem in Multi-Agent Reinforcement Learning (MARL). While stationary strategies are preferred for their practicality, computing stationary forms of classic game-theoretic equilibria is computationally intractable -- a stark contrast to the comparative ease of solving single-agent RL or zero-sum games. To bridge this gap, we study Risk-averse Quantal response Equilibria (RQE), a solution concept rooted in behavioral game theory that incorporates risk aversion and bounded rationality. We demonstrate that RQE possesses strong regularity conditions that make it uniquely amenable to learning in MGs. We propose a novel two-timescale Actor-Critic algorithm characterized by a fast-timescale actor and a slow-timescale critic. Leveraging the regularity of RQE, we prove that this approach…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Game Theory and Applications · Adaptive Dynamic Programming Control