NePPO: Near-Potential Policy Optimization for General-Sum Multi-Agent Reinforcement Learning
Addison Kalanther, Sanika Bharvirkar, Shankar Sastry, Chinmay Maheshwari

TL;DR
NePPO introduces a new MARL method that learns a potential function to find approximate Nash equilibria in complex multi-agent environments, improving stability and performance.
Contribution
The paper proposes Near-Potential Policy Optimization (NePPO), a novel approach for computing approximate Nash equilibria in general-sum multi-agent reinforcement learning.
Findings
NePPO outperforms IPPO and MAPPO in empirical tests.
The method effectively learns a potential function for mixed cooperative-competitive games.
NePPO provides a stable and scalable solution for general-sum MARL environments.
Abstract
Multi-agent reinforcement learning (MARL) is increasingly used to design learning-enabled agents that interact in shared environments. However, training MARL algorithms in general-sum games remains challenging: learning dynamics can become unstable, and convergence guarantees typically hold only in restricted settings such as two-player zero-sum or fully cooperative games. Moreover, when agents have heterogeneous and potentially conflicting preferences, it is unclear what system-level objective should guide learning. In this paper, we propose a new MARL pipeline called Near-Potential Policy Optimization (NePPO) for computing approximate Nash equilibria in mixed cooperative--competitive environments. The core idea is to learn a player-independent potential function such that the Nash equilibrium of a cooperative game with this potential as the common utility approximates a Nash…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
