TL;DR
This paper introduces MATS, a Bayesian algorithm for multi-agent bandit problems with sparse interactions, improving coordination efficiency and outperforming existing methods in synthetic and real-world wind farm scenarios.
Contribution
We propose MATS, a novel Bayesian exploration-exploitation algorithm tailored for loosely-coupled multi-agent bandit problems, with theoretical regret bounds and superior empirical performance.
Findings
MATS achieves sublinear regret bounds in sparse multi-agent settings.
MATS outperforms the state-of-the-art algorithm MAUCE on benchmarks.
Application to wind farm control demonstrates practical benefits of MATS.
Abstract
Multi-agent coordination is prevalent in many real-world applications. However, such coordination is challenging due to its combinatorial nature. An important observation in this regard is that agents in the real world often only directly affect a limited set of neighbouring agents. Leveraging such loose couplings among agents is key to making coordination in multi-agent systems feasible. In this work, we focus on learning to coordinate. Specifically, we consider the multi-agent multi-armed bandit framework, in which fully cooperative loosely-coupled agents must learn to coordinate their decisions to optimize a common objective. We propose multi-agent Thompson sampling (MATS), a new Bayesian exploration-exploitation algorithm that leverages loose couplings. We provide a regret bound that is sublinear in time and low-order polynomial in the highest number of actions of a single agent for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
