Delay and Cooperation in Nonstochastic Bandits
Nicolo' Cesa-Bianchi, Claudio Gentile, Yishay Mansour and, Alberto Minora

TL;DR
This paper introduces a cooperative algorithm for multi-agent nonstochastic bandit problems, demonstrating improved regret bounds through communication delays and network structure considerations.
Contribution
The paper proposes extsc{Exp3-Coop}, a novel cooperative algorithm with theoretical regret bounds that adapt to network delays and structure, advancing multi-agent bandit learning.
Findings
Regret bounds depend on network delay parameter d and graph independence number.
For d=√K, the regret is better than noncooperative minimax regret.
Dense graphs allow regret close to full-information bounds.
Abstract
We study networks of communicating learning agents that cooperate to solve a common nonstochastic bandit problem. Agents use an underlying communication network to get messages about actions selected by other agents, and drop messages that took more than hops to arrive, where is a delay parameter. We introduce \textsc{Exp3-Coop}, a cooperative version of the {\sc Exp3} algorithm and prove that with actions and agents the average per-agent regret after rounds is at most of order , where is the independence number of the -th power of the connected communication graph . We then show that for any connected graph, for the regret bound is , strictly better than the minimax regret for noncooperating agents. More informed choices of lead to bounds…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Age of Information Optimization
