Adversarial Sleeping Bandit Problems with Multiple Plays: Algorithm and   Ranking Application

Jianjun Yuan; Wei Lee Woon; Ludovik Coba

arXiv:2307.14549·cs.LG·July 28, 2023

Adversarial Sleeping Bandit Problems with Multiple Plays: Algorithm and Ranking Application

Jianjun Yuan, Wei Lee Woon, Ludovik Coba

PDF

Open Access

TL;DR

This paper introduces an efficient algorithm for adversarial sleeping bandit problems with multiple plays, applicable to online recommendation systems, providing theoretical guarantees on regret bounds in complex, adversarial environments.

Contribution

It extends existing sleeping bandit algorithms to handle multiple arm selections and adversarial settings, with proven regret bounds.

Findings

01

Regret upper bound of O(kN^2√(T log T)) achieved

02

Algorithm effectively handles adversarial and unknown availability scenarios

03

Applicable to online recommendation systems with multiple choices

Abstract

This paper presents an efficient algorithm to solve the sleeping bandit with multiple plays problem in the context of an online recommendation system. The problem involves bounded, adversarial loss and unknown i.i.d. distributions for arm availability. The proposed algorithm extends the sleeping bandit algorithm for single arm selection and is guaranteed to achieve theoretical performance with regret upper bounded by $\bigO (k N^{2} T lo g T)$ , where $k$ is the number of arms selected per time step, $N$ is the total number of arms, and $T$ is the time horizon.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Machine Learning and Algorithms