Replicable Bandits with UCB based Exploration
Rohan Deb, Udaya Ghai, Karan Singh, Arindam Banerjee

TL;DR
This paper introduces new replicable algorithms for stochastic multi-armed and linear bandits using UCB exploration, improving prior regret bounds and ensuring consistent action sequences across runs.
Contribution
It develops the first optimistic, UCB-based replicable algorithms for both bandit settings, with improved regret bounds and a novel replicable ridge regression estimator.
Findings
RepUCB achieves regret bounds with better dependence on $ ho$ and $K$.
RepRidge provides a confidence-guaranteed, replicable ridge regression estimator.
RepLinUCB reduces the regret dependence on dimension $d$ and replicability parameter $ ho$.
Abstract
We study replicable algorithms for stochastic multi-armed bandits (MAB) and linear bandits with UCB (Upper Confidence Bound) based exploration. A bandit algorithm is -replicable if two executions using shared internal randomness but independent reward realizations, produce the same action sequence with probability at least . Prior work is primarily elimination-based and, in linear bandits with infinitely many actions, relies on discretization, leading to suboptimal dependence on the dimension and . We develop optimistic alternatives for both settings. For stochastic multi-armed bandits, we propose RepUCB, a replicable batched UCB algorithm and show that it attains a regret . For stochastic linear bandits, we first introduce RepRidge, a replicable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
