Best-of-All-Worlds Bounds for Online Learning with Feedback Graphs
Liad Erez, Tomer Koren

TL;DR
This paper introduces a new algorithm for online learning with feedback graphs that achieves optimal regret bounds in both adversarial and stochastic settings, adapting to the feedback graph's structure.
Contribution
It develops a novel Follow-the-Regularized-Leader algorithm with a unique regularizer combining Tsallis and Shannon entropies, achieving best-of-all-worlds regret bounds.
Findings
Achieves $ ilde{O}( oot{ heta(G)} T)$ regret in adversarial losses.
Achieves $ ilde{O}( heta(G))$ regret in stochastic losses.
Handles stochastic losses with adversarial corruptions effectively.
Abstract
We study the online learning with feedback graphs framework introduced by Mannor and Shamir (2011), in which the feedback received by the online learner is specified by a graph over the available actions. We develop an algorithm that simultaneously achieves regret bounds of the form: with adversarial losses; with stochastic losses; and with stochastic losses subject to adversarial corruptions. Here, is the clique covering number of the graph . Our algorithm is an instantiation of Follow-the-Regularized-Leader with a novel regularization that can be seen as a product of a Tsallis entropy component (inspired by Zimmert and Seldin (2019)) and a Shannon entropy component (analyzed in the corrupted…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Stochastic Gradient Optimization Techniques
