Local Asymptotic Normality for Multi-Armed Bandits
Ramon van den Akker, Bas J.M. Werker, Bo Zhou

TL;DR
This paper demonstrates that for multi-armed bandit problems with fixed expected payoffs and a unique optimal arm, common sampling schemes satisfy the standard LAN property, contrasting with the non-standard LAQ case.
Contribution
It establishes the LAN property for multi-armed bandits with fixed payoffs and a unique best arm under typical sampling schemes, extending theoretical understanding.
Findings
LAN property holds for fixed-payoff bandits with a unique optimal arm
UCB and Thompson sampling satisfy regularity conditions for LAN
Contrasts with LAQ case where payoffs differ by O(T^{-1/2})
Abstract
Van den Akker, Werker, and Zhou (2025) showed that the limit experiment, in the sense of H\a'{a}jek-Le Cam, for (contextual) bandits whose arms' expected payoffs differ by , is Locally Asymptotically Quadratic (LAQ) but highly non-standard, being characterized by a system of coupled stochastic differential equations. The present paper considers the complementary case where the arms' expected payoffs are fixed with a unique optimal (in the sense of highest expected payoff) arm. It is shown that, under sampling schemes satisfying mild regularity conditions (including UCB and Thompson sampling), the model satisfies the standard Locally Asymptotically Normal (LAN) property.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Stochastic processes and financial applications · Auction Theory and Applications
