$(\epsilon, u)$-Adaptive Regret Minimization in Heavy-Tailed Bandits
Gianmarco Genalti, Lupo Marsigli, Nicola Gatti, Alberto Maria, Metelli

TL;DR
This paper introduces AdaR-UCB, an adaptive algorithm for heavy-tailed bandit problems with unknown distribution parameters, achieving near-optimal regret guarantees by using a novel trimmed mean estimator.
Contribution
It develops the first adaptive algorithm for heavy-tailed bandits that nearly matches non-adaptive regret bounds without prior knowledge of distribution parameters.
Findings
AdaR-UCB achieves near-optimal regret bounds.
Negative results show adaptation incurs a cost.
A new data-driven trimmed mean estimator is proposed.
Abstract
Heavy-tailed distributions naturally arise in several settings, from finance to telecommunications. While regret minimization under subgaussian or bounded rewards has been widely studied, learning with heavy-tailed distributions only gained popularity over the last decade. In this paper, we consider the setting in which the reward distributions have finite absolute raw moments of maximum order , uniformly bounded by a constant , for some . In this setting, we study the regret minimization problem when and are unknown to the learner and it has to adapt. First, we show that adaptation comes at a cost and derive two negative results proving that the same regret guarantees of the non-adaptive case cannot be achieved with no further assumptions. Then, we devise and analyze a fully data-driven trimmed mean estimator and propose a novel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Distributed Sensor Networks and Detection Algorithms
