Delay and Cooperation in Nonstochastic Bandits

Nicolo' Cesa-Bianchi; Claudio Gentile; Yishay Mansour and; Alberto Minora

arXiv:1602.04741·cs.LG·June 2, 2016·60 cites

Delay and Cooperation in Nonstochastic Bandits

Nicolo' Cesa-Bianchi, Claudio Gentile, Yishay Mansour and, Alberto Minora

PDF

Open Access

TL;DR

This paper introduces a cooperative algorithm for multi-agent nonstochastic bandit problems, demonstrating improved regret bounds through communication delays and network structure considerations.

Contribution

The paper proposes extsc{Exp3-Coop}, a novel cooperative algorithm with theoretical regret bounds that adapt to network delays and structure, advancing multi-agent bandit learning.

Findings

01

Regret bounds depend on network delay parameter d and graph independence number.

02

For d=√K, the regret is better than noncooperative minimax regret.

03

Dense graphs allow regret close to full-information bounds.

Abstract

We study networks of communicating learning agents that cooperate to solve a common nonstochastic bandit problem. Agents use an underlying communication network to get messages about actions selected by other agents, and drop messages that took more than $d$ hops to arrive, where $d$ is a delay parameter. We introduce \textsc{Exp3-Coop}, a cooperative version of the {\sc Exp3} algorithm and prove that with $K$ actions and $N$ agents the average per-agent regret after $T$ rounds is at most of order $(d + 1 + \frac{K}{N} α_{\leq d}) (T ln K)$ , where $α_{\leq d}$ is the independence number of the $d$ -th power of the connected communication graph $G$ . We then show that for any connected graph, for $d = K$ the regret bound is $K^{1/4} T$ , strictly better than the minimax regret $K T$ for noncooperating agents. More informed choices of $d$ lead to bounds…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Age of Information Optimization