Adversarial Sleeping Bandit Problems with Multiple Plays: Algorithm and Ranking Application
Jianjun Yuan, Wei Lee Woon, Ludovik Coba

TL;DR
This paper introduces an efficient algorithm for adversarial sleeping bandit problems with multiple plays, applicable to online recommendation systems, providing theoretical guarantees on regret bounds in complex, adversarial environments.
Contribution
It extends existing sleeping bandit algorithms to handle multiple arm selections and adversarial settings, with proven regret bounds.
Findings
Regret upper bound of O(kN^2√(T log T)) achieved
Algorithm effectively handles adversarial and unknown availability scenarios
Applicable to online recommendation systems with multiple choices
Abstract
This paper presents an efficient algorithm to solve the sleeping bandit with multiple plays problem in the context of an online recommendation system. The problem involves bounded, adversarial loss and unknown i.i.d. distributions for arm availability. The proposed algorithm extends the sleeping bandit algorithm for single arm selection and is guaranteed to achieve theoretical performance with regret upper bounded by , where is the number of arms selected per time step, is the total number of arms, and is the time horizon.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Machine Learning and Algorithms
