On Instability of Minimax Optimal Optimism-Based Bandit Algorithms

Samya Praharaj; Koulik Khamaru

arXiv:2511.18750·stat.ML·November 25, 2025

On Instability of Minimax Optimal Optimism-Based Bandit Algorithms

Samya Praharaj, Koulik Khamaru

PDF

Open Access

TL;DR

This paper investigates the stability of minimax optimal optimism-based bandit algorithms, revealing that many such algorithms are unstable and fail to satisfy the conditions for asymptotic normality, highlighting a fundamental trade-off.

Contribution

The paper provides a theoretical analysis showing that many minimax optimal UCB-style algorithms violate stability conditions, and demonstrates this instability through numerical simulations.

Findings

01

Minimax optimal UCB algorithms are generally unstable.

02

Sample means from these algorithms often lack asymptotic normality.

03

There is a fundamental tension between stability and minimax optimality.

Abstract

Statistical inference from data generated by multi-armed bandit (MAB) algorithms is challenging due to their adaptive, non-i.i.d. nature. A classical manifestation is that sample averages of arm rewards under bandit sampling may fail to satisfy a central limit theorem. Lai and Wei's stability condition provides a sufficient, and essentially necessary criterion, for asymptotic normality in bandit problems. While the celebrated Upper Confidence Bound (UCB) algorithm satisfies this stability condition, it is not minimax optimal, raising the question of whether minimax optimality and statistical stability can be achieved simultaneously. In this paper, we analyze the stability properties of a broad class of bandit algorithms that are based on the optimism principle. We establish general structural conditions under which such algorithms violate the Lai-Wei stability criterion. As a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques · Gaussian Processes and Bayesian Inference