A Near-Optimal Best-of-Both-Worlds Algorithm for Online Learning with Feedback Graphs
Chlo\'e Rouyer, Dirk van der Hoeven, Nicol\`o Cesa-Bianchi, Yevgeny, Seldin

TL;DR
This paper introduces a computationally efficient online learning algorithm with feedback graphs that achieves near-optimal regret bounds in both stochastic and adversarial environments, adapting to changing feedback structures.
Contribution
It presents a novel algorithm combining EXP3++ and EXP3.G ideas, with a new exploration scheme that exploits graph structure for best-of-both-worlds guarantees.
Findings
Achieves $ ilde{O} ( oot{ ext{alpha} T})$ regret against adversaries.
Achieves $O(( ext{ln} T)^2 imes ext{max}_{S ext{ in } ext{I}(G)} ext{sum}_{i ext{ in }S} ext{Delta}_i^{-1})$ regret in stochastic environments.
Extends to dynamic feedback graphs changing over time.
Abstract
We consider online learning with feedback graphs, a sequential decision-making framework where the learner's feedback is determined by a directed graph over the action set. We present a computationally efficient algorithm for learning in this framework that simultaneously achieves near-optimal regret bounds in both stochastic and adversarial environments. The bound against oblivious adversaries is , where is the time horizon and is the independence number of the feedback graph. The bound against stochastic environments is where is the family of all independent sets in a suitably defined undirected version of the graph and are the suboptimality gaps. The algorithm combines ideas from the EXP3++ algorithm for stochastic and adversarial bandits…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsOptimization and Search Problems · Machine Learning and Algorithms · Advanced Bandit Algorithms Research
