Learning in Multi-Player Stochastic Games
William Brown

TL;DR
This paper investigates learning algorithms in multi-player stochastic games, demonstrating computational hardness results and proposing algorithms for generating correlated equilibria with varying complexity depending on game properties.
Contribution
It introduces algorithms for generating extensive-form correlated equilibria in stochastic games, addressing computational hardness and providing efficient solutions for specific game classes.
Findings
Hardness of achieving sublinear regret in adversarial MDPs
Algorithms for extensive-form correlated equilibrium with exponential runtime in horizon
Polynomial-time algorithms for fast-mixing stochastic games
Abstract
We consider the problem of simultaneous learning in stochastic games with many players in the finite-horizon setting. While the typical target solution for a stochastic game is a Nash equilibrium, this is intractable with many players. We instead focus on variants of {\it correlated equilibria}, such as those studied for extensive-form games. We begin with a hardness result for the adversarial MDP problem: even for a horizon of 3, obtaining sublinear regret against the best non-stationary policy is \textsf{NP}-hard when both rewards and transitions are adversarial. This implies that convergence to even the weakest natural solution concept -- normal-form coarse correlated equilbrium -- is not possible via black-box reduction to a no-regret algorithm even in stochastic games with constant horizon (unless ). Instead, we turn to a different target:…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Game Theory and Applications · Auction Theory and Applications
