Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity
Kaiqing Zhang, Sham M. Kakade, Tamer Ba\c{s}ar, Lin F. Yang

TL;DR
This paper analyzes the sample complexity of model-based multi-agent reinforcement learning in zero-sum Markov games, establishing near-optimal bounds and highlighting the tradeoffs between reward-agnostic and reward-aware algorithms.
Contribution
It provides the first tight sample complexity bounds for model-based MARL in zero-sum Markov games, including a minimax lower bound for reward-agnostic methods.
Findings
Sample complexity of O(|S||A||B|(1-)^{-3}^{-2}) for -Nash equilibrium
Reward-agnostic algorithms are nearly minimax optimal up to logarithmic factors
Tradeoff between reward-agnostic and reward-aware approaches in MARL
Abstract
Model-based reinforcement learning (RL), which finds an optimal policy using an empirical model, has long been recognized as one of the corner stones of RL. It is especially suitable for multi-agent RL (MARL), as it naturally decouples the learning and the planning phases, and avoids the non-stationarity problem when all agents are improving their policies simultaneously using samples. Though intuitive and widely-used, the sample complexity of model-based MARL algorithms has not been fully investigated. In this paper, our goal is to address the fundamental question about its sample complexity. We study arguably the most basic MARL setting: two-player discounted zero-sum Markov games, given only access to a generative model. We show that model-based MARL achieves a sample complexity of for finding the Nash equilibrium (NE) value up to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Game Theory and Applications
