Off-Policy Correction For Multi-Agent Reinforcement Learning

Micha{\l} Zawalski; B{\l}a\.zej Osi\'nski; Henryk Michalewski; Piotr; Mi{\l}o\'s

arXiv:2111.11229·cs.LG·April 4, 2024

Off-Policy Correction For Multi-Agent Reinforcement Learning

Micha{\l} Zawalski, B{\l}a\.zej Osi\'nski, Henryk Michalewski, Piotr, Mi{\l}o\'s

PDF

Open Access 1 Repo

TL;DR

This paper introduces MA-Trace, a scalable off-policy actor-critic algorithm for multi-agent reinforcement learning that extends V-Trace, with proven convergence and strong empirical performance on the StarCraft Multi-Agent Challenge.

Contribution

We propose MA-Trace, a novel scalable on-policy actor-critic algorithm for MARL that incorporates importance sampling for off-policy correction and provides theoretical convergence guarantees.

Findings

01

Achieves high performance on StarCraft Multi-Agent Challenge

02

Exceeds state-of-the-art results on some tasks

03

Demonstrates scalability in multi-worker settings

Abstract

Multi-agent reinforcement learning (MARL) provides a framework for problems involving multiple interacting agents. Despite apparent similarity to the single-agent case, multi-agent problems are often harder to train and analyze theoretically. In this work, we propose MA-Trace, a new on-policy actor-critic algorithm, which extends V-Trace to the MARL setting. The key advantage of our algorithm is its high scalability in a multi-worker setting. To this end, MA-Trace utilizes importance sampling as an off-policy correction method, which allows distributing the computations with no impact on the quality of training. Furthermore, our algorithm is theoretically grounded - we prove a fixed-point theorem that guarantees convergence. We evaluate the algorithm extensively on the StarCraft Multi-Agent Challenge, a standard benchmark for multi-agent algorithms. MA-Trace achieves high performance on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

awarelab/seed_rl
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Multi-Agent Systems and Negotiation

MethodsV-trace