Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal   Sample Complexity

Kaiqing Zhang; Sham M. Kakade; Tamer Ba\c{s}ar; Lin F. Yang

arXiv:2007.07461·cs.LG·August 10, 2023·24 cites

Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity

Kaiqing Zhang, Sham M. Kakade, Tamer Ba\c{s}ar, Lin F. Yang

PDF

Open Access 1 Video

TL;DR

This paper analyzes the sample complexity of model-based multi-agent reinforcement learning in zero-sum Markov games, establishing near-optimal bounds and highlighting the tradeoffs between reward-agnostic and reward-aware algorithms.

Contribution

It provides the first tight sample complexity bounds for model-based MARL in zero-sum Markov games, including a minimax lower bound for reward-agnostic methods.

Findings

01

Sample complexity of O(|S||A||B|(1-)^{-3}^{-2}) for -Nash equilibrium

02

Reward-agnostic algorithms are nearly minimax optimal up to logarithmic factors

03

Tradeoff between reward-agnostic and reward-aware approaches in MARL

Abstract

Model-based reinforcement learning (RL), which finds an optimal policy using an empirical model, has long been recognized as one of the corner stones of RL. It is especially suitable for multi-agent RL (MARL), as it naturally decouples the learning and the planning phases, and avoids the non-stationarity problem when all agents are improving their policies simultaneously using samples. Though intuitive and widely-used, the sample complexity of model-based MARL algorithms has not been fully investigated. In this paper, our goal is to address the fundamental question about its sample complexity. We study arguably the most basic MARL setting: two-player discounted zero-sum Markov games, given only access to a generative model. We show that model-based MARL achieves a sample complexity of $\tilde{O} (∣ S ∣∣ A ∣∣ B ∣ (1 - γ)^{- 3} ϵ^{- 2})$ for finding the Nash equilibrium (NE) value up to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Game Theory and Applications