Learning Stationary Nash Equilibrium Policies in $n$-Player Stochastic Games with Independent Chains
S. Rasoul Etesami

TL;DR
This paper introduces polynomial-time algorithms for learning stationary Nash equilibrium policies in a subclass of n-player stochastic games with independent chains, even with limited payoff information and no observation of other players' states or actions.
Contribution
It develops the first polynomial-time learning algorithms for this class of games, utilizing dual averaging and mirror descent, with convergence guarantees and bounds under certain reward function assumptions.
Findings
Algorithms converge to ε-Nash equilibria almost surely or in expectation.
Polynomial bounds on the number of iterations needed for ε-Nash equilibria under social concavity.
Numerical experiments demonstrate effectiveness in energy management scenarios.
Abstract
We consider a subclass of -player stochastic games, in which players have their own internal state/action spaces while they are coupled through their payoff functions. It is assumed that players' internal chains are driven by independent transition probabilities. Moreover, players can receive only realizations of their payoffs, not the actual functions, and cannot observe each other's states/actions. For this class of games, we first show that finding a stationary Nash equilibrium (NE) policy without any assumption on the reward functions is interactable. However, for general reward functions, we develop polynomial-time learning algorithms based on dual averaging and dual mirror descent, which converge in terms of the averaged Nikaido-Isoda distance to the set of -NE policies almost surely or in expectation. In particular, under extra assumptions on the reward functions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSmart Grid Energy Management · Advanced Bandit Algorithms Research · Reinforcement Learning in Robotics
