Decentralized Policy Gradient for Nash Equilibria Learning of General-sum Stochastic Games
Yan Chen, Tao Li

TL;DR
This paper develops decentralized algorithms for learning Nash equilibria in general-sum stochastic games with unknown dynamics, using variational inequalities and gradient estimators, achieving convergence under different information scenarios.
Contribution
It introduces a novel two-loop algorithm for exact pseudo gradients and a decentralized method with Monte-Carlo gradient estimation for unknown pseudo gradients, both converging to weighted asymptotic Nash equilibria.
Findings
Converges to k^{1/2}-weighted asymptotic Nash equilibrium with exact pseudo gradients.
Achieves convergence to k^{1/4}-weighted asymptotic Nash equilibrium with unknown pseudo gradients.
Proposes a decentralized algorithm suitable for stochastic environments with limited information.
Abstract
We study Nash equilibria learning of a general-sum stochastic game with an unknown transition probability density function. Agents take actions at the current environment state and their joint action influences the transition of the environment state and their immediate rewards. Each agent only observes the environment state and its own immediate reward and is unknown about the actions or immediate rewards of others. We introduce the concepts of weighted asymptotic Nash equilibrium with probability 1 and in probability. For the case with exact pseudo gradients, we design a two-loop algorithm by the equivalence of Nash equilibrium and variational inequality problems. In the outer loop, we sequentially update a constructed strongly monotone variational inequality by updating a proximal parameter while employing a single-call extra-gradient algorithm in the inner loop for solving the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Economic Policies and Impacts · Reinforcement Learning in Robotics
