Distributed Bandits with Heterogeneous Agents
Lin Yang, Yu-zhen Janice Chen, Mohammad Hajiesmaili, John CS Lui, Don, Towsley

TL;DR
This paper introduces two algorithms for heterogeneous multi-agent bandit problems, achieving near-optimal regret and low communication complexity, even with asynchronous agents and limited local information.
Contribution
The paper proposes B and E algorithms for heterogeneous multi-agent bandits, providing order-optimal regret bounds and low communication complexity, advancing multi-agent learning theory.
Findings
Both algorithms achieve order-optimal regret bounds.
E reduces communication complexity to O(log T).
Numerical experiments confirm the algorithms' efficiency.
Abstract
This paper tackles a multi-agent bandit setting where agents cooperate together to solve the same instance of a -armed stochastic bandit problem. The agents are \textit{heterogeneous}: each agent has limited access to a local subset of arms and the agents are asynchronous with different gaps between decision-making rounds. The goal for each agent is to find its optimal local arm, and agents can cooperate by sharing their observations with others. While cooperation between agents improves the performance of learning, it comes with an additional complexity of communication between agents. For this heterogeneous multi-agent setting, we propose two learning algorithms, \ucbo and \AAE. We prove that both algorithms achieve order-optimal regret, which is , where is the minimum suboptimality gap…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Game Theory and Applications
