Best-of-All-Worlds Bounds for Online Learning with Feedback Graphs

Liad Erez; Tomer Koren

arXiv:2107.09572·cs.LG·July 21, 2021

Best-of-All-Worlds Bounds for Online Learning with Feedback Graphs

Liad Erez, Tomer Koren

PDF

Open Access

TL;DR

This paper introduces a new algorithm for online learning with feedback graphs that achieves optimal regret bounds in both adversarial and stochastic settings, adapting to the feedback graph's structure.

Contribution

It develops a novel Follow-the-Regularized-Leader algorithm with a unique regularizer combining Tsallis and Shannon entropies, achieving best-of-all-worlds regret bounds.

Findings

01

Achieves $ ilde{O}( oot{ heta(G)} T)$ regret in adversarial losses.

02

Achieves $ ilde{O}( heta(G))$ regret in stochastic losses.

03

Handles stochastic losses with adversarial corruptions effectively.

Abstract

We study the online learning with feedback graphs framework introduced by Mannor and Shamir (2011), in which the feedback received by the online learner is specified by a graph $G$ over the available actions. We develop an algorithm that simultaneously achieves regret bounds of the form: $O (θ (G) T)$ with adversarial losses; $O (θ (G) polylog T)$ with stochastic losses; and $O (θ (G) polylog T + θ (G) C)$ with stochastic losses subject to $C$ adversarial corruptions. Here, $θ (G)$ is the clique covering number of the graph $G$ . Our algorithm is an instantiation of Follow-the-Regularized-Leader with a novel regularization that can be seen as a product of a Tsallis entropy component (inspired by Zimmert and Seldin (2019)) and a Shannon entropy component (analyzed in the corrupted…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Stochastic Gradient Optimization Techniques