Cooperative Bandit Learning in Directed Networks with Arm-Access Constraints

Evagoras Makridis; Themistoklis Charalambous

arXiv:2603.22881·eess.SY·March 25, 2026

Cooperative Bandit Learning in Directed Networks with Arm-Access Constraints

Evagoras Makridis, Themistoklis Charalambous

PDF

Open Access

TL;DR

This paper introduces a distributed UCB algorithm for multi-agent bandit problems with heterogeneous arm access and directed communication networks, achieving logarithmic regret while accounting for network and accessibility constraints.

Contribution

It presents a novel consensus-based UCB method that handles partial arm access and asymmetric communication, with theoretical regret guarantees.

Findings

01

Logarithmic regret bounds are established for each agent.

02

The algorithm's performance depends on network mixing properties.

03

Heterogeneous arm access influences cooperative learning efficiency.

Abstract

Sequential decision-making under uncertainty often involves multiple agents learning which actions (arms) yield the highest rewards through repeated interaction with a stochastic environment. This setting is commonly modeled by cooperative multi-agent multi-armed bandit problems, where agents explore and share information without centralized coordination. In many realistic systems, agents have heterogeneous capabilities that limit their access to subsets of arms and communicate over asymmetric networks represented by directed graphs. In this work, we study multi-agent multi-armed bandit problems with partial arm access, where agents explore and exploit only the arms available to them while exchanging information with neighbors. We propose a distributed consensus-based upper confidence bound (UCB) algorithm that accounts for both the arm accessibility structure and network asymmetry. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Game Theory and Applications · Reinforcement Learning in Robotics