Equilibrium Selection in Multi-Agent Policy Gradients via Opponent-Aware Basin Entry

Yevhen Shcherbinin; Arina Redina; Maxim Kalpin; Vlad Kochetov

arXiv:2605.18078·cs.LG·May 19, 2026

Equilibrium Selection in Multi-Agent Policy Gradients via Opponent-Aware Basin Entry

Yevhen Shcherbinin, Arina Redina, Maxim Kalpin, Vlad Kochetov

PDF

TL;DR

This paper introduces a method to influence which equilibrium multi-agent policy gradients converge to by using opponent-aware basin entry, enhancing cooperative outcomes.

Contribution

It identifies peer-learning correction as a key mechanism for equilibrium selection and proposes annealing to recover standard convergence guarantees.

Findings

01

Peer-aware updates increase entry into cooperative basins.

02

The method decomposes into policy gradient plus opponent-aware corrections.

03

Experiments show improved cooperation in game environments.

Abstract

Multi-agent policy-gradient methods have been shown to converge locally near stable Nash equilibria. Local convergence, however, does not determine which equilibrium is reached. We study this question through basin-entry probability with respect to a target set of equilibria selected by an external criterion, such as payoff dominance. For finite-unroll Meta-MAPG, we show that the update decomposes into ordinary policy gradient plus own-learning and peer-learning corrections, with controlled sampling noise and finite-unroll bias. We identify the peer-learning correction as the main equilibrium-selection mechanism: under a local alignment condition, the probability of entering the certified attraction region of the target stable-Nash set increases, relative to ordinary policy gradient. Because persistent correction may shift zero-update points of the original game, annealing the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.