Randomized Exploration in Cooperative Multi-Agent Reinforcement Learning
Hao-Lun Hsu, Weixin Wang, Miroslav Pajic, Pan Xu

TL;DR
This paper introduces a new framework for randomized exploration in cooperative multi-agent reinforcement learning, providing the first theoretical guarantees and demonstrating improved empirical performance across diverse environments.
Contribution
It proposes a unified algorithm framework with two Thompson Sampling-based algorithms, CoopTS-PHE and CoopTS-LMC, achieving provable regret bounds in cooperative MARL.
Findings
Achieves a regret bound of (d^{3/2}H^2 ext{ extasciitilde} \sqrt{MK}) in linear transition MDPs.
Demonstrates superior performance on multiple RL benchmarks, including a deep exploration problem, video game, and energy system.
Establishes a connection between the framework and federated learning applications.
Abstract
We present the first study on provably efficient randomized exploration in cooperative multi-agent reinforcement learning (MARL). We propose a unified algorithm framework for randomized exploration in parallel Markov Decision Processes (MDPs), and two Thompson Sampling (TS)-type algorithms, CoopTS-PHE and CoopTS-LMC, incorporating the perturbed-history exploration (PHE) strategy and the Langevin Monte Carlo exploration (LMC) strategy, respectively, which are flexible in design and easy to implement in practice. For a special class of parallel MDPs where the transition is (approximately) linear, we theoretically prove that both CoopTS-PHE and CoopTS-LMC achieve a regret bound with communication complexity , where is the feature dimension, is the horizon length, is the number of agents, and is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Metaheuristic Optimization Algorithms Research
