A Bayesian Learning Algorithm for Unknown Zero-sum Stochastic Games with an Arbitrary Opponent
Mehdi Jafarnia-Jahromi, Rahul Jain, Ashutosh Nayyar

TL;DR
This paper introduces PSRL-ZSG, a Bayesian learning algorithm for zero-sum stochastic games that achieves near-optimal regret bounds even against arbitrary opponents, advancing online learning in complex game settings.
Contribution
The paper presents the first online algorithm with Bayesian regret bounds for zero-sum stochastic games against arbitrary opponents, improving previous bounds significantly.
Findings
Achieves Bayesian regret of O(HS√AT) in infinite-horizon games
Outperforms previous regret bounds by Wei et al. (2017)
Matches the theoretical lower bound in T
Abstract
In this paper, we propose Posterior Sampling Reinforcement Learning for Zero-sum Stochastic Games (PSRL-ZSG), the first online learning algorithm that achieves Bayesian regret bound of in the infinite-horizon zero-sum stochastic games with average-reward criterion. Here is an upper bound on the span of the bias function, is the number of states, is the number of joint actions and is the horizon. We consider the online setting where the opponent can not be controlled and can take any arbitrary time-adaptive history-dependent strategy. Our regret bound improves on the best existing regret bound of by Wei et al. (2017) under the same assumption and matches the theoretical lower bound in .
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Decision-Making and Behavioral Economics
