Adversarially Robust Multi-Armed Bandit Algorithm with   Variance-Dependent Regret Bounds

Shinji Ito; Taira Tsuchiya; Junya Honda

arXiv:2206.06810·cs.LG·June 15, 2022·1 cites

Adversarially Robust Multi-Armed Bandit Algorithm with Variance-Dependent Regret Bounds

Shinji Ito, Taira Tsuchiya, Junya Honda

PDF

Open Access

TL;DR

This paper introduces a novel best-of-both-worlds multi-armed bandit algorithm that achieves near-optimal regret bounds in both stochastic and adversarial environments, incorporating variance-dependent analysis for improved performance.

Contribution

It presents the first BOBW algorithm with gap-variance-dependent regret bounds, leveraging variance information in adversarial settings, and employs adaptive learning rates based on empirical prediction errors.

Findings

01

Achieves near-optimal gap-variance-dependent regret bounds.

02

Performs well in both stochastic and adversarial environments.

03

Provides data-dependent regret bounds that adapt to variance.

Abstract

This paper considers the multi-armed bandit (MAB) problem and provides a new best-of-both-worlds (BOBW) algorithm that works nearly optimally in both stochastic and adversarial settings. In stochastic settings, some existing BOBW algorithms achieve tight gap-dependent regret bounds of $O (\sum_{i : Δ_{i} > 0} \frac{l o g T}{Δ _{i}})$ for suboptimality gap $Δ_{i}$ of arm $i$ and time horizon $T$ . As Audibert et al. [2007] have shown, however, that the performance can be improved in stochastic environments with low-variance arms. In fact, they have provided a stochastic MAB algorithm with gap-variance-dependent regret bounds of $O (\sum_{i : Δ_{i} > 0} (\frac{σ _{i}^{2}}{Δ _{i}} + 1) lo g T)$ for loss variance $σ_{i}^{2}$ of arm $i$ . In this paper, we propose the first BOBW algorithm with gap-variance-dependent bounds, showing that the variance information can be used even in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Adversarial Robustness in Machine Learning · Machine Learning and Algorithms