One Arrow, Two Kills: An Unified Framework for Achieving Optimal Regret   Guarantees in Sleeping Bandits

Pierre Gaillard; Aadirupa Saha; Soham Dan

arXiv:2210.14998·cs.LG·October 28, 2022

One Arrow, Two Kills: An Unified Framework for Achieving Optimal Regret Guarantees in Sleeping Bandits

Pierre Gaillard, Aadirupa Saha, Soham Dan

PDF

Open Access

TL;DR

This paper introduces a unified framework for sleeping bandits, proposing a new internal regret measure, an algorithm with sublinear regret, and extending results to dueling bandits, with empirical validation.

Contribution

It unifies various notions of regret in sleeping bandits, introduces a new internal regret concept, and extends the framework to dueling bandits with stochastic preferences.

Findings

01

Proposed an algorithm with sublinear regret in adversarial sleeping bandits.

02

Established low internal regret implies low external and policy regret.

03

Extended results to sleeping dueling bandits with empirical validation.

Abstract

We address the problem of \emph{`Internal Regret'} in \emph{Sleeping Bandits} in the fully adversarial setup, as well as draw connections between different existing notions of sleeping regrets in the multiarmed bandits (MAB) literature and consequently analyze the implications: Our first contribution is to propose the new notion of \emph{Internal Regret} for sleeping MAB. We then proposed an algorithm that yields sublinear regret in that measure, even for a completely adversarial sequence of losses and availabilities. We further show that a low sleeping internal regret always implies a low external regret, and as well as a low policy regret for iid sequence of losses. The main contribution of this work precisely lies in unifying different notions of existing regret in sleeping bandits and understand the implication of one to another. Finally, we also extend our results to the setting of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Adversarial Robustness in Machine Learning · Misinformation and Its Impacts