Replicable Bandits with UCB based Exploration

Rohan Deb; Udaya Ghai; Karan Singh; Arindam Banerjee

arXiv:2604.20024·cs.LG·April 23, 2026

Replicable Bandits with UCB based Exploration

Rohan Deb, Udaya Ghai, Karan Singh, Arindam Banerjee

PDF

TL;DR

This paper introduces new replicable algorithms for stochastic multi-armed and linear bandits using UCB exploration, improving prior regret bounds and ensuring consistent action sequences across runs.

Contribution

It develops the first optimistic, UCB-based replicable algorithms for both bandit settings, with improved regret bounds and a novel replicable ridge regression estimator.

Findings

01

RepUCB achieves regret bounds with better dependence on $ ho$ and $K$.

02

RepRidge provides a confidence-guaranteed, replicable ridge regression estimator.

03

RepLinUCB reduces the regret dependence on dimension $d$ and replicability parameter $ ho$.

Abstract

We study replicable algorithms for stochastic multi-armed bandits (MAB) and linear bandits with UCB (Upper Confidence Bound) based exploration. A bandit algorithm is $ρ$ -replicable if two executions using shared internal randomness but independent reward realizations, produce the same action sequence with probability at least $1 - ρ$ . Prior work is primarily elimination-based and, in linear bandits with infinitely many actions, relies on discretization, leading to suboptimal dependence on the dimension $d$ and $ρ$ . We develop optimistic alternatives for both settings. For stochastic multi-armed bandits, we propose RepUCB, a replicable batched UCB algorithm and show that it attains a regret $O (\frac{K ^{2} l o g ^{2} T}{ρ ^{2}} \sum_{a : Δ_{a} > 0} (Δ_{a} + \frac{l o g ( K T l o g T )}{Δ _{a}}))$ . For stochastic linear bandits, we first introduce RepRidge, a replicable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.