Tail Distribution of Regret in Optimistic Reinforcement Learning

Sajad Khodadadian; Mehrdad Moharrami

arXiv:2511.18247·cs.LG·March 18, 2026

Tail Distribution of Regret in Optimistic Reinforcement Learning

Sajad Khodadadian, Mehrdad Moharrami

PDF

Open Access

TL;DR

This paper derives detailed tail bounds for the regret in optimistic reinforcement learning, providing insights into the probability of large deviations and the distributional behavior of regret in finite-horizon MDPs.

Contribution

It introduces explicit tail bounds for regret in both model-based and model-free optimistic RL algorithms, extending analysis beyond average regret to distributional tail behavior.

Findings

01

Tail bounds exhibit a two-regime structure: sub-Gaussian then sub-Weibull tails.

02

Bounds depend on an instance-dependent scale and a transition threshold.

03

Algorithms' regret bounds are adjustable via a tuning parameter lpha.

Abstract

We derive instance-dependent tail bounds for the regret of optimism-based reinforcement learning in finite-horizon tabular Markov decision processes with unknown transition dynamics. We first study a UCBVI-type (model-based) algorithm and characterize the tail distribution of the cumulative regret $R_{K}$ over $K$ episodes via explicit bounds on $P (R_{K} \geq x)$ , going beyond analyses limited to $E [R_{K}]$ or a single high-probability quantile. We analyze two natural exploration-bonus schedules for UCBVI: (i) a $K$ -dependent scheme that explicitly incorporates the total number of episodes $K$ , and (ii) a $K$ -independent (anytime) scheme that depends only on the current episode index. We then complement the model-based results with an analysis of optimistic Q-learning (model-free) under a $K$ -dependent bonus schedule. Across both the model-based and model-free settings, we obtain upper…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Age of Information Optimization