Bayesian Learning in Episodic Zero-Sum Games
Chang-Wei Yueh, Andy Zhao, Ashutosh Nayyar, Rahul Jain

TL;DR
This paper analyzes Bayesian learning in episodic zero-sum Markov games, providing regret bounds for posterior sampling algorithms and demonstrating their effectiveness through experiments.
Contribution
It introduces a theoretical analysis of posterior sampling in zero-sum Markov games with regret guarantees and compares its performance with baseline methods.
Findings
Posterior sampling achieves sublinear regret in zero-sum Markov games.
The regret bound is of order $O(HS\sqrt{ABHK\log(SABHK)})$.
Experiments show favorable performance against fictitious-play baselines.
Abstract
We study Bayesian learning in episodic, finite-horizon zero-sum Markov games with unknown transition and reward models. We investigate a posterior algorithm in which each player maintains a Bayesian posterior over the game model, independently samples a game model at the beginning of each episode, and computes an equilibrium policy for the sampled model. We analyze two settings: (i) Both players use the posterior sampling algorithm, and (ii) Only one player uses posterior sampling while the opponent follows an arbitrary learning algorithm. In each setting, we provide guarantees on the expected regret of the posterior sampling agent. Our notion of regret compares the expected total reward of the learning agent against the expected total reward under equilibrium policies of the true game. Our main theoretical result is an expected regret bound for the posterior sampling agent of order…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Game Theory and Applications
