Minimax Policy for Heavy-tailed Bandits

Lai Wei; Vaibhav Srivastava

arXiv:2007.10493·stat.ML·November 19, 2020

Minimax Policy for Heavy-tailed Bandits

Lai Wei, Vaibhav Srivastava

PDF

Open Access

TL;DR

This paper introduces Robust MOSS, a new algorithm for heavy-tailed bandits that achieves optimal worst-case regret and adapts to distributions with finite moments of order 1+epsilon.

Contribution

It extends the minimax policy MOSS to heavy-tailed rewards using saturated empirical means, achieving optimal worst-case regret.

Findings

01

Robust MOSS matches the lower bound for worst-case regret.

02

The algorithm maintains distribution-dependent logarithmic regret.

03

It effectively handles rewards with finite moments of order 1+epsilon.

Abstract

We study the stochastic Multi-Armed Bandit (MAB) problem under worst-case regret and heavy-tailed reward distribution. We modify the minimax policy MOSS for the sub-Gaussian reward distribution by using saturated empirical mean to design a new algorithm called Robust MOSS. We show that if the moment of order $1 + ϵ$ for the reward distribution exists, then the refined strategy has a worst-case regret matching the lower bound while maintaining a distribution-dependent logarithm regret.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Machine Learning and Algorithms