Model-Based Reinforcement Learning for Offline Zero-Sum Markov Games
Yuling Yan, Gen Li, Yuxin Chen, Jianqing Fan

TL;DR
This paper introduces a simple, model-based offline reinforcement learning algorithm for two-player zero-sum Markov games that provably finds approximate Nash equilibria with near-optimal sample complexity, improving upon prior methods.
Contribution
The paper proposes VI-LCB-Game, a pessimistic model-based algorithm with Bernstein-style confidence bounds, achieving minimax optimal sample complexity for offline zero-sum Markov games.
Findings
Achieves near-optimal sample complexity for offline Nash equilibrium learning.
Strengthens previous bounds by a factor of {A,B} and is minimax optimal.
Algorithmic simplicity without variance reduction or sample splitting.
Abstract
This paper makes progress towards learning Nash equilibria in two-player zero-sum Markov games from offline data. Specifically, consider a -discounted infinite-horizon Markov game with states, where the max-player has actions and the min-player has actions. We propose a pessimistic model-based algorithm with Bernstein-style lower confidence bounds -- called VI-LCB-Game -- that provably finds an -approximate Nash equilibrium with a sample complexity no larger than (up to some log factor). Here, is some unilateral clipped concentrability coefficient that reflects the coverage and distribution shift of the available data (vis-\`a-vis the target data), and the target accuracy can be any value within . Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Markov Chains and Monte Carlo Methods · Advanced Bandit Algorithms Research
