Improved Regret Bounds for Linear Bandits with Heavy-Tailed Rewards

Artin Tajdini; Jonathan Scarlett; Kevin Jamieson

arXiv:2506.04775·cs.LG·January 28, 2026

Improved Regret Bounds for Linear Bandits with Heavy-Tailed Rewards

Artin Tajdini, Jonathan Scarlett, Kevin Jamieson

PDF

Open Access

TL;DR

None

Contribution

None

Abstract

We study stochastic linear bandits with heavy-tailed rewards, where the rewards have a finite $(1 + ϵ)$ -absolute central moment bounded by $υ$ for some $ϵ \in (0, 1]$ . We improve both upper and lower bounds on the minimax regret compared to prior work. When $υ = O (1)$ , the best prior known regret upper bound is $\tilde{O} (d T^{\frac{1}{1 + ϵ}})$ . While a lower with the same scaling has been given, it relies on a construction using $υ = O (d)$ , and adapting the construction to the bounded-moment regime with $υ = O (1)$ yields only a $Ω (d^{\frac{ϵ}{1 + ϵ}} T^{\frac{1}{1 + ϵ}})$ lower bound. This matches the known rate for multi-armed bandits and is generally loose for linear bandits, in particular being $d$ below the optimal rate in the finite-variance case ( $ϵ = 1$ ). We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques · Age of Information Optimization

MethodsSparse Evolutionary Training