Nonstochastic Multi-Armed Bandits with Graph-Structured Feedback
Noga Alon, Nicol\`o Cesa-Bianchi, Claudio Gentile, Shie Mannor, Yishay, Mansour, Ohad Shamir

TL;DR
This paper introduces a new partial-information online learning model with graph-structured feedback, bridging the gap between full-information and bandit settings, and provides algorithms with tight regret bounds based on feedback structure.
Contribution
It proposes a novel graph-structured feedback model for online learning and develops algorithms with optimal regret bounds for this setting.
Findings
Algorithms achieve tight regret bounds depending on feedback graph properties.
The model generalizes existing full-information and bandit frameworks.
Provides theoretical analysis of regret in graph-structured feedback scenarios.
Abstract
We present and study a partial-information model of online learning, where a decision maker repeatedly chooses from a finite set of actions, and observes some subset of the associated losses. This naturally models several situations where the losses of different actions are related, and knowing the loss of one action provides information on the loss of other actions. Moreover, it generalizes and interpolates between the well studied full-information setting (where all losses are revealed) and the bandit setting (where only the loss of the action chosen by the player is revealed). We provide several algorithms addressing different variants of our setting, and provide tight regret bounds depending on combinatorial properties of the information feedback structure.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
