Adversarially Robust Multi-Armed Bandit Algorithm with Variance-Dependent Regret Bounds
Shinji Ito, Taira Tsuchiya, Junya Honda

TL;DR
This paper introduces a novel best-of-both-worlds multi-armed bandit algorithm that achieves near-optimal regret bounds in both stochastic and adversarial environments, incorporating variance-dependent analysis for improved performance.
Contribution
It presents the first BOBW algorithm with gap-variance-dependent regret bounds, leveraging variance information in adversarial settings, and employs adaptive learning rates based on empirical prediction errors.
Findings
Achieves near-optimal gap-variance-dependent regret bounds.
Performs well in both stochastic and adversarial environments.
Provides data-dependent regret bounds that adapt to variance.
Abstract
This paper considers the multi-armed bandit (MAB) problem and provides a new best-of-both-worlds (BOBW) algorithm that works nearly optimally in both stochastic and adversarial settings. In stochastic settings, some existing BOBW algorithms achieve tight gap-dependent regret bounds of for suboptimality gap of arm and time horizon . As Audibert et al. [2007] have shown, however, that the performance can be improved in stochastic environments with low-variance arms. In fact, they have provided a stochastic MAB algorithm with gap-variance-dependent regret bounds of for loss variance of arm . In this paper, we propose the first BOBW algorithm with gap-variance-dependent bounds, showing that the variance information can be used even in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Adversarial Robustness in Machine Learning · Machine Learning and Algorithms
