Online learning with Erd\H{o}s-R\'enyi side-observation graphs
Tom\'a\v{s} Koc\'ak, Gergely Neu, Michal Valko

TL;DR
This paper introduces algorithms for adversarial multi-armed bandit problems with side observations, achieving near-optimal regret bounds depending on the probability of observing additional arm losses.
Contribution
The paper proposes two algorithms tailored for different observation probabilities, providing near-optimal regret bounds in adversarial bandit settings with side observations.
Findings
First algorithm achieves $O(\sqrt{(T /r) \log N })$ regret for $r \ge (\log T)/(2N)$.
Second algorithm achieves $O(\sqrt{(T/r) \\log (N+T)})$ regret for smaller $r$.
A quick estimation procedure determines the relevant range of $r$.
Abstract
We consider adversarial multi-armed bandit problems where the learner is allowed to observe losses of a number of arms beside the arm that it actually chose. We study the case where all non-chosen arms reveal their loss with a fixed but unknown probability , independently of each other and the action of the learner. We propose two algorithms that work for different ranges of . We show that after rounds in a bandit problem with arms, the expected regret of our first algorithm is whenever , while our second algorithm achieves a regret of for smaller values of . We also give a quick estimation procedure that decides the range of~. All our bounds are within logarithmic factors of the best achievable performance of any algorithm that is even allowed to know~.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
