Extended UCB Policies for Multi-armed Bandit Problems

Keqin Liu; Tianshuo Zheng; Zhi-Hua Zhou

arXiv:1112.1768·cs.LG·September 16, 2025·1 cites

Extended UCB Policies for Multi-armed Bandit Problems

Keqin Liu, Tianshuo Zheng, Zhi-Hua Zhou

PDF

Open Access

TL;DR

This paper extends UCB policies to handle heavy-tailed reward distributions in multi-armed bandit problems, achieving near-optimal regret without prior distribution knowledge, broadening practical applicability.

Contribution

It generalizes existing UCB policies to arbitrary moments, enabling effective handling of heavy-tailed rewards with minimal distribution assumptions.

Findings

01

Achieves optimal regret growth order $O( ext{log } T)$ for heavy-tailed rewards.

02

Extends UCB policies to arbitrary moments $p>q>1$ with known relationships.

03

Maintains near-optimal regret without prior distribution knowledge.

Abstract

The multi-armed bandit (MAB) problems are widely studied in fields of operations research, stochastic optimization, and reinforcement learning. In this paper, we consider the classical MAB model with heavy-tailed reward distributions and introduce the extended robust UCB policy, which is an extension of the results of Bubeck et al. [5] and Lattimore [22] that are further based on the pioneering idea of UCB policies [e.g. Auer et al. 3]. The previous UCB policies require some strict conditions on reward distributions, which can be difficult to guarantee in practical scenarios. Our extended robust UCB generalizes Lattimore's seminary work (for moments of orders $p = 4$ and $q = 2$ ) to arbitrarily chosen $p > q > 1$ as long as the two moments have a known controlled relationship, while still achieving the optimal regret growth order $O (l o g T)$ , thus providing a broadened application area of UCB…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Optimization and Search Problems