Collaborative Min-Max Regret in Grouped Multi-Armed Bandits

Mo\"ise Blanchard; Vineet Goyal

arXiv:2506.10313·cs.LG·June 13, 2025

Collaborative Min-Max Regret in Grouped Multi-Armed Bandits

Mo\"ise Blanchard, Vineet Goyal

PDF

Open Access

TL;DR

This paper introduces Col-UCB, an algorithm for shared exploration in grouped multi-armed bandits that minimizes maximum collaborative regret, effectively balancing exploration costs across groups with overlapping actions.

Contribution

The paper proposes Col-UCB, a novel algorithm that adaptively coordinates exploration in grouped bandits, achieving optimal regret bounds and insights into collaboration benefits.

Findings

01

Col-UCB achieves near-optimal minimax regret bounds.

02

Collaboration benefits depend on shared action set structure.

03

Algorithm adapts to different group overlap scenarios.

Abstract

We study the impact of sharing exploration in multi-armed bandits in a grouped setting where a set of groups have overlapping feasible action sets [Baek and Farias '24]. In this grouped bandit setting, groups share reward observations, and the objective is to minimize the collaborative regret, defined as the maximum regret across groups. This naturally captures applications in which one aims to balance the exploration burden between groups or populations -- it is known that standard algorithms can lead to significantly imbalanced exploration cost between groups. We address this problem by introducing an algorithm Col-UCB that dynamically coordinates exploration across groups. We show that Col-UCB achieves both optimal minimax and instance-dependent collaborative regret up to logarithmic factors. These bounds are adaptive to the structure of shared action sets between groups, providing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Mobile Crowdsensing and Crowdsourcing · Recommender Systems and Techniques

MethodsSparse Evolutionary Training