Asymptotic Theory and Sequential Test for General Multi-Armed Bandit Process

Li Yang; Xiaodong Yan; and Dandan Jiang

arXiv:2602.22768·stat.ME·February 27, 2026

Asymptotic Theory and Sequential Test for General Multi-Armed Bandit Process

Li Yang, Xiaodong Yan, and Dandan Jiang

PDF

Open Access

TL;DR

This paper introduces the Urn Bandit process, integrating urn models with multi-armed bandit principles, providing asymptotic theory and sequential testing methods for non-i.i.d. reward sequences, with demonstrated improved reward outcomes.

Contribution

It develops the Urn Bandit process and establishes the first asymptotic theory for non-i.i.d. rewards, enabling effective sequential testing and resource allocation in complex MAB settings.

Findings

01

UNB ensures almost sure convergence to optimal arms.

02

The joint FCLT applies to non-i.i.d., correlated rewards.

03

Simulation and real data show improved reward performance.

Abstract

Multi-armed bandit (MAB) processes constitute a foundational subclass of reinforcement learning problems and represent a central topic in statistical decision theory, but are limited to simultaneous adaptive allocation and sequential test, because of the absence of asymptotic theory under non-i.i.d sequence and sublinear information. To address this open challenge, we propose Urn Bandit (UNB) process to integrate the reinforcement mechanism of urn probabilistic models with MAB principles, ensuring almost sure convergence of resource allocation to optimal arms. We establish the joint functional central limit theorem (FCLT) for consistent estimators of expected rewards under non-i.i.d., non-sub-Gaussian and sublinear reward samples with pairwise correlations across arms. To overcome the limitations of existing methods that focus mainly on cumulative regret, we establish the asymptotic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Stochastic Gradient Optimization Techniques