Bayesian Learning in Episodic Zero-Sum Games

Chang-Wei Yueh; Andy Zhao; Ashutosh Nayyar; Rahul Jain

arXiv:2603.20604·cs.LG·March 24, 2026

Bayesian Learning in Episodic Zero-Sum Games

Chang-Wei Yueh, Andy Zhao, Ashutosh Nayyar, Rahul Jain

PDF

Open Access

TL;DR

This paper analyzes Bayesian learning in episodic zero-sum Markov games, providing regret bounds for posterior sampling algorithms and demonstrating their effectiveness through experiments.

Contribution

It introduces a theoretical analysis of posterior sampling in zero-sum Markov games with regret guarantees and compares its performance with baseline methods.

Findings

01

Posterior sampling achieves sublinear regret in zero-sum Markov games.

02

The regret bound is of order $O(HS\sqrt{ABHK\log(SABHK)})$.

03

Experiments show favorable performance against fictitious-play baselines.

Abstract

We study Bayesian learning in episodic, finite-horizon zero-sum Markov games with unknown transition and reward models. We investigate a posterior algorithm in which each player maintains a Bayesian posterior over the game model, independently samples a game model at the beginning of each episode, and computes an equilibrium policy for the sampled model. We analyze two settings: (i) Both players use the posterior sampling algorithm, and (ii) Only one player uses posterior sampling while the opponent follows an arbitrary learning algorithm. In each setting, we provide guarantees on the expected regret of the posterior sampling agent. Our notion of regret compares the expected total reward of the learning agent against the expected total reward under equilibrium policies of the true game. Our main theoretical result is an expected regret bound for the posterior sampling agent of order…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Game Theory and Applications