Independent Natural Policy Gradient Methods for Potential Games: Finite-time Global Convergence with Entropy Regularization
Shicong Cen, Fan Chen, Yuejie Chi

TL;DR
This paper introduces an entropy-regularized natural policy gradient method for potential games, achieving finite-time convergence to equilibrium with rates independent of action space size and, in some cases, the number of agents.
Contribution
It proposes a decentralized, entropy-regularized NPG algorithm with dimension-free convergence rates for potential games, including identical-interest cases.
Findings
Converges to quantal response equilibrium at a sublinear rate.
Rate is independent of action space size.
Rate is sublinear in the number of agents, dimension-free for identical-interest games.
Abstract
A major challenge in multi-agent systems is that the system complexity grows dramatically with the number of agents as well as the size of their action spaces, which is typical in real world scenarios such as autonomous vehicles, robotic teams, network routing, etc. It is hence in imminent need to design decentralized or independent algorithms where the update of each agent is only based on their local observations without the need of introducing complex communication/coordination mechanisms. In this work, we study the finite-time convergence of independent entropy-regularized natural policy gradient (NPG) methods for potential games, where the difference in an agent's utility function due to unilateral deviation matches exactly that of a common potential function. The proposed entropy-regularized NPG method enables each agent to deploy symmetric, decentralized, and multiplicative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed Control Multi-Agent Systems · Reinforcement Learning in Robotics · Adaptive Dynamic Programming Control
