Catoni-Style Change Point Detection for Regret Minimization in Non-Stationary Heavy-Tailed Bandits
Gianmarco Genalti, Sujay Bhatt, Nicola Gatti, Alberto Maria Metelli

TL;DR
This paper introduces a novel change-point detection method tailored for heavy-tailed non-stationary bandits, enabling improved regret minimization in settings with heavy-tailed reward distributions.
Contribution
It proposes a Catoni-style change-point detection strategy and the Robust-CPD-UCB algorithm, specifically designed for heavy-tailed, piecewise-stationary bandit problems, with theoretical regret bounds.
Findings
Robust-CPD-UCB achieves near-optimal regret bounds.
The method effectively detects change points in heavy-tailed environments.
Numerical experiments confirm the approach's effectiveness on real-world data.
Abstract
Regret minimization in stochastic non-stationary bandits gained popularity over the last decade, as it can model a broad class of real-world problems, from advertising to recommendation systems. Existing literature relies on various assumptions about the reward-generating process, such as Bernoulli or subgaussian rewards. However, in settings such as finance and telecommunications, heavy-tailed distributions naturally arise. In this work, we tackle the heavy-tailed piecewise-stationary bandit problem. Heavy-tailed bandits, introduced by Bubeck et al., 2013, operate on the minimal assumption that the finite absolute centered moments of maximum order are uniformly bounded by a constant , for some . We focus on the most popular non-stationary bandit setting, i.e., the piecewise-stationary setting, in which the mean of reward-generating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research
MethodsFocus
