A Near-Optimal Best-of-Both-Worlds Algorithm for Online Learning with   Feedback Graphs

Chlo\'e Rouyer; Dirk van der Hoeven; Nicol\`o Cesa-Bianchi; Yevgeny; Seldin

arXiv:2206.00557·cs.LG·June 2, 2022·1 cites

A Near-Optimal Best-of-Both-Worlds Algorithm for Online Learning with Feedback Graphs

Chlo\'e Rouyer, Dirk van der Hoeven, Nicol\`o Cesa-Bianchi, Yevgeny, Seldin

PDF

Open Access 1 Video

TL;DR

This paper introduces a computationally efficient online learning algorithm with feedback graphs that achieves near-optimal regret bounds in both stochastic and adversarial environments, adapting to changing feedback structures.

Contribution

It presents a novel algorithm combining EXP3++ and EXP3.G ideas, with a new exploration scheme that exploits graph structure for best-of-both-worlds guarantees.

Findings

01

Achieves $ ilde{O} ( oot{ ext{alpha} T})$ regret against adversaries.

02

Achieves $O(( ext{ln} T)^2 imes ext{max}_{S ext{ in } ext{I}(G)} ext{sum}_{i ext{ in }S} ext{Delta}_i^{-1})$ regret in stochastic environments.

03

Extends to dynamic feedback graphs changing over time.

Abstract

We consider online learning with feedback graphs, a sequential decision-making framework where the learner's feedback is determined by a directed graph over the action set. We present a computationally efficient algorithm for learning in this framework that simultaneously achieves near-optimal regret bounds in both stochastic and adversarial environments. The bound against oblivious adversaries is $\tilde{O} (α T)$ , where $T$ is the time horizon and $α$ is the independence number of the feedback graph. The bound against stochastic environments is $O ((ln T)^{2} max_{S \in I (G)} \sum_{i \in S} Δ_{i}^{- 1})$ where $I (G)$ is the family of all independent sets in a suitably defined undirected version of the graph and $Δ_{i}$ are the suboptimality gaps. The algorithm combines ideas from the EXP3++ algorithm for stochastic and adversarial bandits…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

A Near-Optimal Best-of-Both-Worlds Algorithm for Online Learning with Feedback Graphs· slideslive

Taxonomy

TopicsOptimization and Search Problems · Machine Learning and Algorithms · Advanced Bandit Algorithms Research