Improved Best-of-Both-Worlds Guarantees for Multi-Armed Bandits: FTRL   with General Regularizers and Multiple Optimal Arms

Tiancheng Jin; Junyan Liu; Haipeng Luo

arXiv:2302.13534·cs.LG·October 27, 2023·1 cites

Improved Best-of-Both-Worlds Guarantees for Multi-Armed Bandits: FTRL with General Regularizers and Multiple Optimal Arms

Tiancheng Jin, Junyan Liu, Haipeng Luo

PDF

Open Access 1 Video

TL;DR

This paper advances multi-armed bandit algorithms by generalizing FTRL to work without the uniqueness of the optimal arm, achieving optimal performance in both stochastic and adversarial settings with broader regularizers.

Contribution

It generalizes and improves FTRL-based algorithms for best-of-both-world guarantees, removing the need for a unique optimal arm and introducing a new learning rate schedule.

Findings

01

FTRL with a broad family of regularizers can adapt to both stochastic and adversarial environments.

02

The new approach removes the uniqueness assumption for the optimal arm.

03

Regret bounds are improved for some regularizers even when the uniqueness condition holds.

Abstract

We study the problem of designing adaptive multi-armed bandit algorithms that perform optimally in both the stochastic setting and the adversarial setting simultaneously (often known as a best-of-both-world guarantee). A line of recent works shows that when configured and analyzed properly, the Follow-the-Regularized-Leader (FTRL) algorithm, originally designed for the adversarial setting, can in fact optimally adapt to the stochastic setting as well. Such results, however, critically rely on an assumption that there exists one unique optimal arm. Recently, Ito (2021) took the first step to remove such an undesirable uniqueness assumption for one particular FTRL algorithm with the $\frac{1}{2}$ -Tsallis entropy regularizer. In this work, we significantly improve and generalize this result, showing that uniqueness is unnecessary for FTRL with a broad family of regularizers and a new…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Improved Best-of-Both-Worlds Guarantees for Multi-Armed Bandits: FTRL with General Regularizers and Multiple Optimal Arms· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Adversarial Robustness in Machine Learning · Reinforcement Learning in Robotics