Learning to Coordinate Under Threshold Rewards: A Cooperative Multi-Agent Bandit Framework

Michael Ledford; William Regli

arXiv:2506.15856·cs.MA·June 23, 2025

Learning to Coordinate Under Threshold Rewards: A Cooperative Multi-Agent Bandit Framework

Michael Ledford, William Regli

PDF

Open Access

TL;DR

This paper introduces a decentralized multi-agent bandit algorithm that learns activation thresholds and avoids decoys, enabling effective coordination under threshold-based rewards and outperforming baseline methods.

Contribution

The paper presents T-Coop-UCB, a novel algorithm for decentralized learning of thresholds and rewards in multi-agent bandit problems with decoys, advancing coordination capabilities.

Findings

01

T-Coop-UCB outperforms baseline methods in cumulative reward.

02

The algorithm achieves low regret and effective coordination.

03

Near-Oracle performance demonstrates its efficiency.

Abstract

Cooperative multi-agent systems often face tasks that require coordinated actions under uncertainty. While multi-armed bandit (MAB) problems provide a powerful framework for decentralized learning, most prior work assumes individually attainable rewards. We address the challenging setting where rewards are threshold-activated: an arm yields a payoff only when a minimum number of agents pull it simultaneously, with this threshold unknown in advance. Complicating matters further, some arms are decoys - requiring coordination to activate but yielding no reward - introducing a new challenge of wasted joint exploration. We introduce Threshold-Coop-UCB (T-Coop-UCB), a decentralized algorithm that enables agents to jointly learn activation thresholds and reward distributions, forming effective coalitions without centralized control. Empirical results show that T-Coop-UCB consistently…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Mobile Crowdsensing and Crowdsourcing