Bandit Allocational Instability

Yilun Chen; Jiaqi Lu

arXiv:2602.07472·cs.LG·February 10, 2026

Bandit Allocational Instability

Yilun Chen, Jiaqi Lu

PDF

Open Access

TL;DR

This paper introduces a new metric called allocation variability for multi-armed bandit algorithms, revealing a fundamental trade-off with regret and providing bounds and an algorithm that achieve near-optimal performance.

Contribution

It establishes a fundamental trade-off between allocation variability and regret in bandit algorithms, introduces a tunable algorithm UCB-f, and resolves an open question in the field.

Findings

01

Worst-case regret and allocation variability satisfy R_T * S_T=Ω(T^{3/2})

02

Any sublinear regret algorithm must have S_T=ω(√T)

03

UCB-f achieves the Pareto optimal trade-off

Abstract

When multi-armed bandit (MAB) algorithms allocate pulls among competing arms, the resulting allocation can exhibit huge variation. This is particularly harmful in modern applications such as learning-enhanced platform operations and post-bandit statistical inference. Thus motivated, we introduce a new performance metric of MAB algorithms termed allocation variability, which is the largest (over arms) standard deviation of an arm's number of pulls. We establish a fundamental trade-off between allocation variability and regret, the canonical performance metric of reward maximization. In particular, for any algorithm, the worst-case regret $R_{T}$ and worst-case allocation variability $S_{T}$ must satisfy $R_{T} \cdot S_{T} = Ω (T^{\frac{3}{2}})$ as $T \to \infty$ , as long as $R_{T} = o (T)$ . This indicates that any minimax regret-optimal algorithm must incur worst-case allocation variability…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques · Machine Learning and Algorithms