Asymptotic Theory and Sequential Test for General Multi-Armed Bandit Process
Li Yang, Xiaodong Yan, and Dandan Jiang

TL;DR
This paper introduces the Urn Bandit process, integrating urn models with multi-armed bandit principles, providing asymptotic theory and sequential testing methods for non-i.i.d. reward sequences, with demonstrated improved reward outcomes.
Contribution
It develops the Urn Bandit process and establishes the first asymptotic theory for non-i.i.d. rewards, enabling effective sequential testing and resource allocation in complex MAB settings.
Findings
UNB ensures almost sure convergence to optimal arms.
The joint FCLT applies to non-i.i.d., correlated rewards.
Simulation and real data show improved reward performance.
Abstract
Multi-armed bandit (MAB) processes constitute a foundational subclass of reinforcement learning problems and represent a central topic in statistical decision theory, but are limited to simultaneous adaptive allocation and sequential test, because of the absence of asymptotic theory under non-i.i.d sequence and sublinear information. To address this open challenge, we propose Urn Bandit (UNB) process to integrate the reinforcement mechanism of urn probabilistic models with MAB principles, ensuring almost sure convergence of resource allocation to optimal arms. We establish the joint functional central limit theorem (FCLT) for consistent estimators of expected rewards under non-i.i.d., non-sub-Gaussian and sublinear reward samples with pairwise correlations across arms. To overcome the limitations of existing methods that focus mainly on cumulative regret, we establish the asymptotic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Stochastic Gradient Optimization Techniques
