Online Learning with Feedback Graphs: Beyond Bandits
Noga Alon, Nicol\`o Cesa-Bianchi, Ofer Dekel, Tomer Koren

TL;DR
This paper classifies feedback graphs in online learning problems and characterizes how their structure influences the minimax regret, extending previous work and connecting to partial monitoring games.
Contribution
It introduces a classification of feedback graphs into three classes and derives regret bounds for each, generalizing prior results and analyzing time-varying graphs.
Findings
Strongly observable graphs lead to (\u00b7^{1/2} T^{1/2}) regret.
Weakly observable graphs lead to (rac{}{3} T^{2/3}) regret.
Unobservable graphs result in linear regret.
Abstract
We study a general class of online learning problems where the feedback is specified by a graph. This class includes online prediction with expert advice and the multi-armed bandit problem, but also several learning problems where the online player does not necessarily observe his own loss. We analyze how the structure of the feedback graph controls the inherent difficulty of the induced -round learning problem. Specifically, we show that any feedback graph belongs to one of three classes: strongly observable graphs, weakly observable graphs, and unobservable graphs. We prove that the first class induces learning problems with minimax regret, where is the independence number of the underlying graph; the second class induces problems with minimax regret, where is the domination number of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Machine Learning and Algorithms
