Local Asymptotic Normality for Multi-Armed Bandits

Ramon van den Akker; Bas J.M. Werker; Bo Zhou

arXiv:2512.12192·math.ST·December 16, 2025

Local Asymptotic Normality for Multi-Armed Bandits

Ramon van den Akker, Bas J.M. Werker, Bo Zhou

PDF

Open Access

TL;DR

This paper demonstrates that for multi-armed bandit problems with fixed expected payoffs and a unique optimal arm, common sampling schemes satisfy the standard LAN property, contrasting with the non-standard LAQ case.

Contribution

It establishes the LAN property for multi-armed bandits with fixed payoffs and a unique best arm under typical sampling schemes, extending theoretical understanding.

Findings

01

LAN property holds for fixed-payoff bandits with a unique optimal arm

02

UCB and Thompson sampling satisfy regularity conditions for LAN

03

Contrasts with LAQ case where payoffs differ by O(T^{-1/2})

Abstract

Van den Akker, Werker, and Zhou (2025) showed that the limit experiment, in the sense of H\a'{a}jek-Le Cam, for (contextual) bandits whose arms' expected payoffs differ by $O (T^{- 1/2})$ , is Locally Asymptotically Quadratic (LAQ) but highly non-standard, being characterized by a system of coupled stochastic differential equations. The present paper considers the complementary case where the arms' expected payoffs are fixed with a unique optimal (in the sense of highest expected payoff) arm. It is shown that, under sampling schemes satisfying mild regularity conditions (including UCB and Thompson sampling), the model satisfies the standard Locally Asymptotically Normal (LAN) property.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Stochastic processes and financial applications · Auction Theory and Applications