Data-Driven Upper Confidence Bounds with Near-Optimal Regret for Heavy-Tailed Bandits
Ambrus Tam\'as, Szabolcs Szentp\'eteri, Bal\'azs Csan\'ad Cs\'aji

TL;DR
This paper introduces a new data-driven UCB algorithm for heavy-tailed bandits that does not require prior knowledge of distribution moments, achieving near-optimal regret bounds.
Contribution
It presents a distribution-free, parameter-free UCB algorithm combining resampled median-of-means with UCB, suitable for heavy-tailed reward distributions.
Findings
Achieves near-optimal regret bounds for heavy-tailed bandits.
Does not require prior knowledge of distribution moments.
Works with symmetric reward distributions.
Abstract
Stochastic multi-armed bandits (MABs) provide a fundamental reinforcement learning model to study sequential decision making in uncertain environments. The upper confidence bounds (UCB) algorithm gave birth to the renaissance of bandit algorithms, as it achieves near-optimal regret rates under various moment assumptions. Up until recently most UCB methods relied on concentration inequalities leading to confidence bounds which depend on moment parameters, such as the variance proxy, that are usually unknown in practice. In this paper, we propose a new distribution-free, data-driven UCB algorithm for symmetric reward distributions, which needs no moment information. The key idea is to combine a refined, one-sided version of the recently developed resampled median-of-means (RMM) method with UCB. We prove a near-optimal regret bound for the proposed anytime, parameter-free RMM-UCB method,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Distributed Sensor Networks and Detection Algorithms · Sparse and Compressive Sensing Techniques
