Almost Sure Convergence of Networked Policy Gradient over Time-Varying Networks in Markov Potential Games
Sarper Aydin, Ceyhun Eksin

TL;DR
This paper introduces a networked policy gradient method for Markov potential games with time-varying communication networks, proving almost sure convergence to stationary points without bounded gradient assumptions.
Contribution
It presents a novel convergence proof for networked policy gradient in Markov potential games, accommodating time-varying networks and removing previous bounded gradient constraints.
Findings
Convergence to stationary points is proven with rate O(1/ε²).
Numerical experiments show convergence of local beliefs and gradients.
Networked policy gradient achieves higher rewards than independent updates.
Abstract
We propose networked policy gradient play for solving Markov potential games with continuous and/or discrete state-action pairs. During the game, agents use parametrized and differentiable policies that depend on the current state and the policy parameters of other agents. During training, agents update their policy parameters following stochastic gradients. The gradient estimation involves two consecutive episodes, generating unbiased estimators of reward and policy score functions. In addition, it involves keeping estimates of others' parameters using consensus steps given local estimates received through a time-varying communication network. In Markov potential games, there exists a potential value function among agents with gradients corresponding to the gradients of local value functions. Using this structure, we prove almost sure convergence to a stationary point of the potential…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAge of Information Optimization · Advanced Wireless Network Optimization · Distributed Sensor Networks and Detection Algorithms
