Simultaneously Learning Stochastic and Adversarial Bandits with General Graph Feedback
Fang Kong, Yichi Zhou, Shuai Li

TL;DR
This paper introduces a new algorithm for online learning with general graph feedback that effectively balances exploration and exploitation, achieving near-optimal regret in both stochastic and adversarial settings.
Contribution
It presents the first best-of-both-worlds algorithm for general feedback graphs, handling both stochastic and adversarial environments without prior feedback knowledge.
Findings
Achieves polylogarithmic regret in stochastic setting
Attains minimax-optimal regret in adversarial setting
Works with general, directed feedback graphs
Abstract
The problem of online learning with graph feedback has been extensively studied in the literature due to its generality and potential to model various learning tasks. Existing works mainly study the adversarial and stochastic feedback separately. If the prior knowledge of the feedback mechanism is unavailable or wrong, such specially designed algorithms could suffer great loss. To avoid this problem, \citet{erez2021towards} try to optimize for both environments. However, they assume the feedback graphs are undirected and each vertex has a self-loop, which compromises the generality of the framework and may not be satisfied in applications. With a general feedback graph, the observation of an arm may not be available when this arm is pulled, which makes the exploration more expensive and the algorithms more challenging to perform optimally in both environments. In this work, we overcome…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Optimization and Search Problems
