Breaking the Moments Condition Barrier: No-Regret Algorithm for Bandits with Super Heavy-Tailed Payoffs
Han Zhong, Jiayi Huang, Lin F. Yang, Liwei Wang

TL;DR
This paper introduces a robust estimator called mean of medians for bandit problems with super heavy-tailed noise, achieving near-optimal regret bounds and demonstrating effectiveness through empirical validation.
Contribution
It proposes a novel mean of medians estimator and a reduction framework that handle super heavy-tailed noise in bandit learning, extending robustness beyond traditional assumptions.
Findings
Achieves near-optimal regret bounds under super heavy-tailed noise.
The mean of medians estimator effectively filters reward signals in bandit algorithms.
Empirical results confirm the theoretical robustness of the proposed method.
Abstract
Despite a large amount of effort in dealing with heavy-tailed error in machine learning, little is known when moments of the error can become non-existential: the random noise satisfies Pr for some . We make the first attempt to actively handle such super heavy-tailed noise in bandit learning problems: We propose a novel robust statistical estimator, mean of medians, which estimates a random variable by computing the empirical mean of a sequence of empirical medians. We then present a generic reductionist algorithmic framework for solving bandit learning problems (including multi-armed and linear bandit problem): the mean of medians estimator can be applied to nearly any bandit learning algorithm as a black-box filtering for its reward signals and obtain similar regret bound as if the reward is sub-Gaussian. We show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Distributed Sensor Networks and Detection Algorithms · Machine Learning and Algorithms
