Data-Driven Upper Confidence Bounds with Near-Optimal Regret for   Heavy-Tailed Bandits

Ambrus Tam\'as; Szabolcs Szentp\'eteri; Bal\'azs Csan\'ad Cs\'aji

arXiv:2406.05710·cs.LG·June 11, 2024

Data-Driven Upper Confidence Bounds with Near-Optimal Regret for Heavy-Tailed Bandits

Ambrus Tam\'as, Szabolcs Szentp\'eteri, Bal\'azs Csan\'ad Cs\'aji

PDF

Open Access

TL;DR

This paper introduces a new data-driven UCB algorithm for heavy-tailed bandits that does not require prior knowledge of distribution moments, achieving near-optimal regret bounds.

Contribution

It presents a distribution-free, parameter-free UCB algorithm combining resampled median-of-means with UCB, suitable for heavy-tailed reward distributions.

Findings

01

Achieves near-optimal regret bounds for heavy-tailed bandits.

02

Does not require prior knowledge of distribution moments.

03

Works with symmetric reward distributions.

Abstract

Stochastic multi-armed bandits (MABs) provide a fundamental reinforcement learning model to study sequential decision making in uncertain environments. The upper confidence bounds (UCB) algorithm gave birth to the renaissance of bandit algorithms, as it achieves near-optimal regret rates under various moment assumptions. Up until recently most UCB methods relied on concentration inequalities leading to confidence bounds which depend on moment parameters, such as the variance proxy, that are usually unknown in practice. In this paper, we propose a new distribution-free, data-driven UCB algorithm for symmetric reward distributions, which needs no moment information. The key idea is to combine a refined, one-sided version of the recently developed resampled median-of-means (RMM) method with UCB. We prove a near-optimal regret bound for the proposed anytime, parameter-free RMM-UCB method,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Distributed Sensor Networks and Detection Algorithms · Sparse and Compressive Sensing Techniques