Randomized Exploration in Cooperative Multi-Agent Reinforcement Learning

Hao-Lun Hsu; Weixin Wang; Miroslav Pajic; Pan Xu

arXiv:2404.10728·cs.LG·March 4, 2025·1 cites

Randomized Exploration in Cooperative Multi-Agent Reinforcement Learning

Hao-Lun Hsu, Weixin Wang, Miroslav Pajic, Pan Xu

PDF

Open Access 1 Video

TL;DR

This paper introduces a new framework for randomized exploration in cooperative multi-agent reinforcement learning, providing the first theoretical guarantees and demonstrating improved empirical performance across diverse environments.

Contribution

It proposes a unified algorithm framework with two Thompson Sampling-based algorithms, CoopTS-PHE and CoopTS-LMC, achieving provable regret bounds in cooperative MARL.

Findings

01

Achieves a regret bound of (d^{3/2}H^2 ext{ extasciitilde} \sqrt{MK}) in linear transition MDPs.

02

Demonstrates superior performance on multiple RL benchmarks, including a deep exploration problem, video game, and energy system.

03

Establishes a connection between the framework and federated learning applications.

Abstract

We present the first study on provably efficient randomized exploration in cooperative multi-agent reinforcement learning (MARL). We propose a unified algorithm framework for randomized exploration in parallel Markov Decision Processes (MDPs), and two Thompson Sampling (TS)-type algorithms, CoopTS-PHE and CoopTS-LMC, incorporating the perturbed-history exploration (PHE) strategy and the Langevin Monte Carlo exploration (LMC) strategy, respectively, which are flexible in design and easy to implement in practice. For a special class of parallel MDPs where the transition is (approximately) linear, we theoretically prove that both CoopTS-PHE and CoopTS-LMC achieve a $O (d^{3/2} H^{2} M K)$ regret bound with communication complexity $O (d H M^{2})$ , where $d$ is the feature dimension, $H$ is the horizon length, $M$ is the number of agents, and $K$ is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Randomized Exploration in Cooperative Multi-Agent Reinforcement Learning· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Metaheuristic Optimization Algorithms Research