Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed   Bandits

Jiatai Huang; Yan Dai; Longbo Huang

arXiv:2201.11921·cs.LG·June 14, 2022·1 cites

Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed Bandits

Jiatai Huang, Yan Dai, Longbo Huang

PDF

Open Access 1 Video

TL;DR

This paper introduces adaptive algorithms for heavy-tailed multi-armed bandits that perform optimally in both stochastic and adversarial environments, even when key parameters are unknown, advancing the robustness and adaptability of bandit algorithms.

Contribution

The paper presents the first algorithms achieving best-of-both-worlds regret guarantees for heavy-tailed MABs, adapting to unknown tail parameters and environment types.

Findings

01

exttt{HTINF} achieves optimal regret in known-parameter settings.

02

exttt{HTINF} attains near-optimal regret without prior environment knowledge.

03

exttt{AdaTINF} matches lower bounds in adversarial heavy-tailed bandits.

Abstract

In this paper, we generalize the concept of heavy-tailed multi-armed bandits to adversarial environments, and develop robust best-of-both-worlds algorithms for heavy-tailed multi-armed bandits (MAB), where losses have $α$ -th ( $1 < α \leq 2$ ) moments bounded by $σ^{α}$ , while the variances may not exist. Specifically, we design an algorithm \texttt{HTINF}, when the heavy-tail parameters $α$ and $σ$ are known to the agent, \texttt{HTINF} simultaneously achieves the optimal regret for both stochastic and adversarial environments, without knowing the actual environment type a-priori. When $α, σ$ are unknown, \texttt{HTINF} achieves a $lo g T$ -style instance-dependent regret in stochastic cases and $o (T)$ no-regret guarantee in adversarial cases. We further develop an algorithm \texttt{AdaTINF}, achieving $\mathcal O(\sigma K^{1-\nicefrac…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed Bandits· youtube

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Adversarial Robustness in Machine Learning · Reinforcement Learning in Robotics