Finite-Time Regret Analysis of Retry-Aware Bandits

Bingkui Tong; Junpei Komiyama; Soichiro Nishimori; Paavo Parmas

arXiv:2605.20854·cs.LG·May 21, 2026

Finite-Time Regret Analysis of Retry-Aware Bandits

Bingkui Tong, Junpei Komiyama, Soichiro Nishimori, Paavo Parmas

PDF

TL;DR

This paper analyzes a retry-aware bandit algorithm called ReMax, providing the first sublinear regret bounds and exploring its exploration-exploitation behavior compared to Thompson sampling.

Contribution

It characterizes the optimal ReMax distribution for Gaussian rewards, proves sublinear regret bounds, and explains its unique exploration properties.

Findings

01

ReMax often outperforms KL-UCB and Thompson sampling under mild underestimation.

02

Posterior-variance scaling empirically mitigates severe underestimation.

03

ReMax can be more exploitative than Thompson sampling.

Abstract

We study a stochastic bandit algorithm motivated by retry-aware objectives that value the best outcome among multiple attempts, such as pass@ $k$ and max@ $k$ . Given a posterior over arm values, ReMax chooses a sampling distribution that maximizes the posterior expected maximum reward over $M$ virtual draws. Although this objective was introduced in reinforcement learning as an exploration mechanism under uncertainty, its regret properties in bandit problems have remained unclear. For Gaussian rewards and the first nontrivial case $M = 2$ , we characterize the optimal ReMax distribution through an expected-improvement balance condition and prove the first sublinear regret bound for ReMax. Our analysis separates the usual saturation behavior of suboptimal arms from a ReMax-specific underestimation effect, in which the optimal arm may be sampled too rarely after an unfavorable estimate. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.