A Simple and Optimal Policy Design with Safety against Heavy-Tailed Risk   for Stochastic Bandits

David Simchi-Levi; Zeyu Zheng; Feng Zhu

arXiv:2206.02969·stat.ML·July 23, 2024·1 cites

A Simple and Optimal Policy Design with Safety against Heavy-Tailed Risk for Stochastic Bandits

David Simchi-Levi, Zeyu Zheng, Feng Zhu

PDF

Open Access

TL;DR

This paper introduces a new policy for stochastic multi-armed bandits that achieves optimal worst-case expected regret and provides strong tail bounds on regret distribution, balancing exploration and exploitation effectively.

Contribution

The paper proposes a novel policy that is both worst-case optimal for expected regret and has the best possible tail probability bounds, with extensions to unknown horizon and linear bandits.

Findings

01

Achieves $O(\sqrt{KT\ln T})$ expected regret bound.

02

Provides exponential tail bounds on regret distribution.

03

Outperforms existing policies in tail risk and hyper-parameter tuning.

Abstract

We study the stochastic multi-armed bandit problem and design new policies that enjoy both worst-case optimality for expected regret and light-tailed risk for regret distribution. Specifically, our policy design (i) enjoys the worst-case optimality for the expected regret at order $O (K T ln T)$ and (ii) has the worst-case tail probability of incurring a regret larger than any $x > 0$ being upper bounded by $exp (- Ω (x / K T))$ , a rate that we prove to be best achievable with respect to $T$ for all worst-case optimal policies. Our proposed policy achieves a delicate balance between doing more exploration at the beginning of the time horizon and doing more exploitation when approaching the end, compared to standard confidence-bound-based policies. We also enhance the policy design to accommodate the "any-time" setting where $T$ is unknown a priori, and prove equivalently…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Risk and Portfolio Optimization · Age of Information Optimization