Model-Based Reinforcement Learning for Offline Zero-Sum Markov Games

Yuling Yan; Gen Li; Yuxin Chen; Jianqing Fan

arXiv:2206.04044·cs.LG·March 18, 2025·1 cites

Model-Based Reinforcement Learning for Offline Zero-Sum Markov Games

Yuling Yan, Gen Li, Yuxin Chen, Jianqing Fan

PDF

Open Access

TL;DR

This paper introduces a simple, model-based offline reinforcement learning algorithm for two-player zero-sum Markov games that provably finds approximate Nash equilibria with near-optimal sample complexity, improving upon prior methods.

Contribution

The paper proposes VI-LCB-Game, a pessimistic model-based algorithm with Bernstein-style confidence bounds, achieving minimax optimal sample complexity for offline zero-sum Markov games.

Findings

01

Achieves near-optimal sample complexity for offline Nash equilibrium learning.

02

Strengthens previous bounds by a factor of {A,B} and is minimax optimal.

03

Algorithmic simplicity without variance reduction or sample splitting.

Abstract

This paper makes progress towards learning Nash equilibria in two-player zero-sum Markov games from offline data. Specifically, consider a $γ$ -discounted infinite-horizon Markov game with $S$ states, where the max-player has $A$ actions and the min-player has $B$ actions. We propose a pessimistic model-based algorithm with Bernstein-style lower confidence bounds -- called VI-LCB-Game -- that provably finds an $ε$ -approximate Nash equilibrium with a sample complexity no larger than $\frac{C _{clipped}^{⋆} S ( A + B )}{( 1 - γ ) ^{3} ε ^{2}}$ (up to some log factor). Here, $C_{clipped}^{⋆}$ is some unilateral clipped concentrability coefficient that reflects the coverage and distribution shift of the available data (vis-\`a-vis the target data), and the target accuracy $ε$ can be any value within $(0, \frac{1}{1 - γ}]$ . Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Markov Chains and Monte Carlo Methods · Advanced Bandit Algorithms Research