Provably Efficient Reinforcement Learning in Decentralized General-Sum   Markov Games

Weichao Mao; Tamer Ba\c{s}ar

arXiv:2110.05682·cs.LG·February 1, 2022

Provably Efficient Reinforcement Learning in Decentralized General-Sum Markov Games

Weichao Mao, Tamer Ba\c{s}ar

PDF

Open Access

TL;DR

This paper introduces a decentralized reinforcement learning algorithm for general-sum Markov games that efficiently finds approximate coarse correlated equilibria with proven sample complexity bounds, advancing multi-agent learning theory.

Contribution

It presents the first sample complexity result for decentralized learning in general-sum Markov games using a novel combination of optimistic V-learning and online mirror descent.

Findings

01

Achieves an $ ilde{O}(H^6 S A / \epsilon^2)$ sample complexity for $\\epsilon$-approximate CCE.

02

The algorithm is fully decentralized, scalable, and requires only local information.

03

Introduces a new high-probability regret bound for online mirror descent with dynamic learning rates.

Abstract

This paper addresses the problem of learning an equilibrium efficiently in general-sum Markov games through decentralized multi-agent reinforcement learning. Given the fundamental difficulty of calculating a Nash equilibrium (NE), we instead aim at finding a coarse correlated equilibrium (CCE), a solution concept that generalizes NE by allowing possible correlations among the agents' strategies. We propose an algorithm in which each agent independently runs optimistic V-learning (a variant of Q-learning) to efficiently explore the unknown environment, while using a stabilized online mirror descent (OMD) subroutine for policy updates. We show that the agents can find an $ϵ$ -approximate CCE in at most $O (H^{6} S A / ϵ^{2})$ episodes, where $S$ is the number of states, $A$ is the size of the largest individual action space, and $H$ is the length of an episode. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Age of Information Optimization