Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed Bandits
Jiatai Huang, Yan Dai, Longbo Huang

TL;DR
This paper introduces adaptive algorithms for heavy-tailed multi-armed bandits that perform optimally in both stochastic and adversarial environments, even when key parameters are unknown, advancing the robustness and adaptability of bandit algorithms.
Contribution
The paper presents the first algorithms achieving best-of-both-worlds regret guarantees for heavy-tailed MABs, adapting to unknown tail parameters and environment types.
Findings
exttt{HTINF} achieves optimal regret in known-parameter settings.
exttt{HTINF} attains near-optimal regret without prior environment knowledge.
exttt{AdaTINF} matches lower bounds in adversarial heavy-tailed bandits.
Abstract
In this paper, we generalize the concept of heavy-tailed multi-armed bandits to adversarial environments, and develop robust best-of-both-worlds algorithms for heavy-tailed multi-armed bandits (MAB), where losses have -th () moments bounded by , while the variances may not exist. Specifically, we design an algorithm \texttt{HTINF}, when the heavy-tail parameters and are known to the agent, \texttt{HTINF} simultaneously achieves the optimal regret for both stochastic and adversarial environments, without knowing the actual environment type a-priori. When are unknown, \texttt{HTINF} achieves a -style instance-dependent regret in stochastic cases and no-regret guarantee in adversarial cases. We further develop an algorithm \texttt{AdaTINF}, achieving $\mathcal O(\sigma K^{1-\nicefrac…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed Bandits· youtube
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Adversarial Robustness in Machine Learning · Reinforcement Learning in Robotics
