Unified Framework of Distributional Regret in Multi-Armed Bandits and Reinforcement Learning

Harin Lee; Min-hwan Oh

arXiv:2605.05102·cs.LG·May 8, 2026

Unified Framework of Distributional Regret in Multi-Armed Bandits and Reinforcement Learning

Harin Lee, Min-hwan Oh

PDF

TL;DR

This paper introduces a unified probabilistic framework for distributional regret in multi-armed bandits and reinforcement learning, providing bounds that balance expected performance and tail risk.

Contribution

It proposes a simple UCBVI-style algorithm with adjustable exploration bonuses and derives optimal distributional regret bounds in various regimes.

Findings

01

Achieves distributional regret bounds of order O(√(AT) log(1/δ)) for multi-armed bandits.

02

Provides a unified framework for gap-independent and gap-dependent bounds.

03

Confirms a conjecture by Lattimore & Szepesvári (2020) regarding distributional regret.

Abstract

We study the distribution of regret in stochastic multi-armed bandits and episodic reinforcement learning through a unified framework. We formalize a distributional regret bound as a probabilistic guarantee that holds uniformly over all confidence levels $δ \in (0, 1]$ , thereby characterizing the regret distribution across the full range of $δ$ . We present a simple UCBVI-style algorithm with exploration bonus $min {c_{1, k} / N, c_{2, k} / N}$ , where $N$ denotes the visit count and $(c_{1, k}, c_{2, k})$ are user-specified parameters. For arbitrary parameter sequences, we derive general gap-independent and gap-dependent distributional regret bounds, yielding a principled characterization of how the parameters control the trade-off between expected performance, tail risk, and instance-dependent behavior. In particular, our bounds achieve optimal trade-offs between expected and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.