A Bayesian Learning Algorithm for Unknown Zero-sum Stochastic Games with   an Arbitrary Opponent

Mehdi Jafarnia-Jahromi; Rahul Jain; Ashutosh Nayyar

arXiv:2109.03396·cs.LG·March 12, 2024

A Bayesian Learning Algorithm for Unknown Zero-sum Stochastic Games with an Arbitrary Opponent

Mehdi Jafarnia-Jahromi, Rahul Jain, Ashutosh Nayyar

PDF

Open Access

TL;DR

This paper introduces PSRL-ZSG, a Bayesian learning algorithm for zero-sum stochastic games that achieves near-optimal regret bounds even against arbitrary opponents, advancing online learning in complex game settings.

Contribution

The paper presents the first online algorithm with Bayesian regret bounds for zero-sum stochastic games against arbitrary opponents, improving previous bounds significantly.

Findings

01

Achieves Bayesian regret of O(HS√AT) in infinite-horizon games

02

Outperforms previous regret bounds by Wei et al. (2017)

03

Matches the theoretical lower bound in T

Abstract

In this paper, we propose Posterior Sampling Reinforcement Learning for Zero-sum Stochastic Games (PSRL-ZSG), the first online learning algorithm that achieves Bayesian regret bound of $O (H S A T)$ in the infinite-horizon zero-sum stochastic games with average-reward criterion. Here $H$ is an upper bound on the span of the bias function, $S$ is the number of states, $A$ is the number of joint actions and $T$ is the horizon. We consider the online setting where the opponent can not be controlled and can take any arbitrary time-adaptive history-dependent strategy. Our regret bound improves on the best existing regret bound of $O (3 D S^{2} A T^{2})$ by Wei et al. (2017) under the same assumption and matches the theoretical lower bound in $T$ .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Decision-Making and Behavioral Economics