Improving Sample Efficiency of Model-Free Algorithms for Zero-Sum Markov   Games

Songtao Feng; Ming Yin; Yu-Xiang Wang; Jing Yang; Yingbin Liang

arXiv:2308.08858·cs.LG·June 7, 2024

Improving Sample Efficiency of Model-Free Algorithms for Zero-Sum Markov Games

Songtao Feng, Ming Yin, Yu-Xiang Wang, Jing Yang, Yingbin Liang

PDF

Open Access

TL;DR

This paper introduces a model-free Q-learning algorithm for zero-sum Markov games that achieves the same optimal sample complexity as model-based methods, significantly improving sample efficiency in multi-agent reinforcement learning.

Contribution

It presents the first model-free algorithm matching the optimal sample complexity of model-based algorithms for zero-sum Markov games, using a novel value function update technique.

Findings

01

Achieves optimal $H$-dependence in sample complexity.

02

Uses variance reduction with reference-advantage decomposition.

03

Introduces optimistic and pessimistic value functions for better efficiency.

Abstract

The problem of two-player zero-sum Markov games has recently attracted increasing interests in theoretical studies of multi-agent reinforcement learning (RL). In particular, for finite-horizon episodic Markov decision processes (MDPs), it has been shown that model-based algorithms can find an $ϵ$ -optimal Nash Equilibrium (NE) with the sample complexity of $O (H^{3} S A B / ϵ^{2})$ , which is optimal in the dependence of the horizon $H$ and the number of states $S$ (where $A$ and $B$ denote the number of actions of the two players, respectively). However, none of the existing model-free algorithms can achieve such an optimality. In this work, we propose a model-free stage-based Q-learning algorithm and show that it achieves the same sample complexity as the best model-based algorithm, and hence for the first time demonstrate that model-free algorithms can enjoy the same optimality in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Game Theory and Applications

MethodsNone · Q-Learning