Regret Tail Characterization of Optimal Bandit Algorithms with Generic Rewards
Subhodip Panda, Shubhada Agrawal

TL;DR
This paper analyzes the tail behavior of regret in asymptotically optimal bandit algorithms, extending nonparametric guarantees and providing tight bounds that unify various reward models.
Contribution
It extends the KL-inf-UCB algorithm to nonparametric rewards and derives a tight, unified characterization of regret tail probabilities for these algorithms.
Findings
Derived a novel upper bound on regret tail probability.
Recovered known tail guarantees for bounded and heavy-tailed models.
Matched the lower bound for finitely-supported reward distributions.
Abstract
We study the tail behavior of regret in stochastic multi-armed bandits for algorithms that are asymptotically optimal in expectation. While minimizing expected regret is the classical objective, recent work shows that even such algorithms can exhibit heavy regret tails, incurring large regret with non-negligible probability. Existing sharp characterizations of regret tails are largely restricted to parametric settings, such as single-parameter exponential families. In this work, we extend the -UCB algorithm of to a broad nonparametric class of reward distributions satisfying mild assumptions, and establish its asymptotic optimality in expectation. We then analyze the tail behavior of its regret and derive a novel upper bound on the regret tail probability. As special cases, our results recover regret-tail guarantees for both bounded-support and heavy-tailed (moment-bounded)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
