Breaking the Moments Condition Barrier: No-Regret Algorithm for Bandits   with Super Heavy-Tailed Payoffs

Han Zhong; Jiayi Huang; Lin F. Yang; Liwei Wang

arXiv:2110.13876·cs.LG·October 27, 2021

Breaking the Moments Condition Barrier: No-Regret Algorithm for Bandits with Super Heavy-Tailed Payoffs

Han Zhong, Jiayi Huang, Lin F. Yang, Liwei Wang

PDF

Open Access 1 Video

TL;DR

This paper introduces a robust estimator called mean of medians for bandit problems with super heavy-tailed noise, achieving near-optimal regret bounds and demonstrating effectiveness through empirical validation.

Contribution

It proposes a novel mean of medians estimator and a reduction framework that handle super heavy-tailed noise in bandit learning, extending robustness beyond traditional assumptions.

Findings

01

Achieves near-optimal regret bounds under super heavy-tailed noise.

02

The mean of medians estimator effectively filters reward signals in bandit algorithms.

03

Empirical results confirm the theoretical robustness of the proposed method.

Abstract

Despite a large amount of effort in dealing with heavy-tailed error in machine learning, little is known when moments of the error can become non-existential: the random noise $η$ satisfies Pr $[∣ η ∣ > ∣ y ∣] \leq 1/∣ y ∣^{α}$ for some $α > 0$ . We make the first attempt to actively handle such super heavy-tailed noise in bandit learning problems: We propose a novel robust statistical estimator, mean of medians, which estimates a random variable by computing the empirical mean of a sequence of empirical medians. We then present a generic reductionist algorithmic framework for solving bandit learning problems (including multi-armed and linear bandit problem): the mean of medians estimator can be applied to nearly any bandit learning algorithm as a black-box filtering for its reward signals and obtain similar regret bound as if the reward is sub-Gaussian. We show that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Breaking the Moments Condition Barrier: No-Regret Algorithm for Bandits with Super Heavy-Tailed Payoffs· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Distributed Sensor Networks and Detection Algorithms · Machine Learning and Algorithms