Heterogeneous Multi-Agent Bandits with Parsimonious Hints
Amirmahdi Mirfakhar, Xuchuang Wang, Jinhang Zuo, Yair Zick, Mohammad, Hajiesmaili

TL;DR
This paper introduces a new multi-agent bandit framework with hints, proposing algorithms for centralized and decentralized settings that minimize hints while achieving near-optimal regret, supported by theoretical bounds and simulations.
Contribution
It develops novel algorithms for heterogeneous multi-agent bandits with hints, providing regret and hint complexity bounds, and establishes their optimality through lower bounds.
Findings
GP-HCLA achieves $O(M^4K)$ regret with $O(MK ext{log} T)$ hints.
Decentralized algorithms reach $O(M^3K^2)$ regret with $O(M^3K ext{log} T)$ hints.
Lower bounds confirm the optimality of proposed algorithms.
Abstract
We study a hinted heterogeneous multi-agent multi-armed bandits problem (HMA2B), where agents can query low-cost observations (hints) in addition to pulling arms. In this framework, each of the agents has a unique reward distribution over arms, and in rounds, they can observe the reward of the arm they pull only if no other agent pulls that arm. The goal is to maximize the total utility by querying the minimal necessary hints without pulling arms, achieving time-independent regret. We study HMA2B in both centralized and decentralized setups. Our main centralized algorithm, GP-HCLA, which is an extension of HCLA, uses a central decision-maker for arm-pulling and hint queries, achieving regret with adaptive hints. In decentralized setups, we propose two algorithms, HD-ETC and EBHD-ETC, that allow agents to choose actions independently through…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Game Theory and Applications · Optimization and Search Problems
MethodsHierarchical Information Threading
