Local Optimization Achieves Global Optimality in Multi-Agent   Reinforcement Learning

Yulai Zhao; Zhuoran Yang; Zhaoran Wang; Jason D. Lee

arXiv:2305.04819·cs.LG·May 9, 2023·2 cites

Local Optimization Achieves Global Optimality in Multi-Agent Reinforcement Learning

Yulai Zhao, Zhuoran Yang, Zhaoran Wang, Jason D. Lee

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a provably convergent multi-agent PPO algorithm that leverages local optimization to achieve global optimality in cooperative Markov games, supported by theoretical guarantees and experimental validation.

Contribution

It presents the first provably convergent multi-agent PPO algorithm with theoretical guarantees and extends it to off-policy settings with pessimism for improved performance.

Findings

01

Algorithm converges to global optimum at a sublinear rate

02

Extension to off-policy setting with pessimism improves evaluation

03

First provably convergent multi-agent PPO in cooperative games

Abstract

Policy optimization methods with function approximation are widely used in multi-agent reinforcement learning. However, it remains elusive how to design such algorithms with statistical guarantees. Leveraging a multi-agent performance difference lemma that characterizes the landscape of multi-agent policy optimization, we find that the localized action value function serves as an ideal descent direction for each local policy. Motivated by the observation, we present a multi-agent PPO algorithm in which the local policy of each agent is updated similarly to vanilla PPO. We prove that with standard regularity conditions on the Markov game and problem-dependent quantities, our algorithm converges to the globally optimal policy at a sublinear rate. We extend our algorithm to the off-policy setting and introduce pessimism to policy evaluation, which aligns with experiments. To our knowledge,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhaoyl18/ratio_game
noneOfficial

Videos

Local Optimization Achieves Global Optimality in Multi-Agent Reinforcement Learning· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Game Theory and Applications · Distributed Control Multi-Agent Systems

MethodsEntropy Regularization · Proximal Policy Optimization