On Instability of Minimax Optimal Optimism-Based Bandit Algorithms
Samya Praharaj, Koulik Khamaru

TL;DR
This paper investigates the stability of minimax optimal optimism-based bandit algorithms, revealing that many such algorithms are unstable and fail to satisfy the conditions for asymptotic normality, highlighting a fundamental trade-off.
Contribution
The paper provides a theoretical analysis showing that many minimax optimal UCB-style algorithms violate stability conditions, and demonstrates this instability through numerical simulations.
Findings
Minimax optimal UCB algorithms are generally unstable.
Sample means from these algorithms often lack asymptotic normality.
There is a fundamental tension between stability and minimax optimality.
Abstract
Statistical inference from data generated by multi-armed bandit (MAB) algorithms is challenging due to their adaptive, non-i.i.d. nature. A classical manifestation is that sample averages of arm rewards under bandit sampling may fail to satisfy a central limit theorem. Lai and Wei's stability condition provides a sufficient, and essentially necessary criterion, for asymptotic normality in bandit problems. While the celebrated Upper Confidence Bound (UCB) algorithm satisfies this stability condition, it is not minimax optimal, raising the question of whether minimax optimality and statistical stability can be achieved simultaneously. In this paper, we analyze the stability properties of a broad class of bandit algorithms that are based on the optimism principle. We establish general structural conditions under which such algorithms violate the Lai-Wei stability criterion. As a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques · Gaussian Processes and Bayesian Inference
