Heterogeneous Multi-Agent Bandits with Parsimonious Hints

Amirmahdi Mirfakhar; Xuchuang Wang; Jinhang Zuo; Yair Zick; Mohammad; Hajiesmaili

arXiv:2502.16128·cs.LG·February 28, 2025

Heterogeneous Multi-Agent Bandits with Parsimonious Hints

Amirmahdi Mirfakhar, Xuchuang Wang, Jinhang Zuo, Yair Zick, Mohammad, Hajiesmaili

PDF

Open Access

TL;DR

This paper introduces a new multi-agent bandit framework with hints, proposing algorithms for centralized and decentralized settings that minimize hints while achieving near-optimal regret, supported by theoretical bounds and simulations.

Contribution

It develops novel algorithms for heterogeneous multi-agent bandits with hints, providing regret and hint complexity bounds, and establishes their optimality through lower bounds.

Findings

01

GP-HCLA achieves $O(M^4K)$ regret with $O(MK ext{log} T)$ hints.

02

Decentralized algorithms reach $O(M^3K^2)$ regret with $O(M^3K ext{log} T)$ hints.

03

Lower bounds confirm the optimality of proposed algorithms.

Abstract

We study a hinted heterogeneous multi-agent multi-armed bandits problem (HMA2B), where agents can query low-cost observations (hints) in addition to pulling arms. In this framework, each of the $M$ agents has a unique reward distribution over $K$ arms, and in $T$ rounds, they can observe the reward of the arm they pull only if no other agent pulls that arm. The goal is to maximize the total utility by querying the minimal necessary hints without pulling arms, achieving time-independent regret. We study HMA2B in both centralized and decentralized setups. Our main centralized algorithm, GP-HCLA, which is an extension of HCLA, uses a central decision-maker for arm-pulling and hint queries, achieving $O (M^{4} K)$ regret with $O (M K lo g T)$ adaptive hints. In decentralized setups, we propose two algorithms, HD-ETC and EBHD-ETC, that allow agents to choose actions independently through…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Game Theory and Applications · Optimization and Search Problems

MethodsHierarchical Information Threading