Selective Reviews of Bandit Problems in AI via a Statistical View

Pengjie Zhou; Haoyu Wei; Huiming Zhang

arXiv:2412.02251·stat.ML·February 20, 2025

Selective Reviews of Bandit Problems in AI via a Statistical View

Pengjie Zhou, Haoyu Wei, Huiming Zhang

PDF

Open Access

TL;DR

This paper reviews the foundational models, theoretical tools, and recent advances in stochastic multi-armed and continuum-armed bandit problems within reinforcement learning, comparing algorithms and exploring their connections to functional data analysis.

Contribution

It provides a comprehensive overview of models, theoretical methods, and recent developments in bandit problems, highlighting connections to functional data analysis and ongoing challenges.

Findings

01

Comparison of frequentist and Bayesian algorithms for exploration-exploitation

02

Analysis of regret bounds and theoretical tools like concentration inequalities

03

Discussion of recent advances and open challenges in bandit research

Abstract

Reinforcement Learning (RL) is a widely researched area in artificial intelligence that focuses on teaching agents decision-making through interactions with their environment. A key subset includes stochastic multi-armed bandit (MAB) and continuum-armed bandit (SCAB) problems, which model sequential decision-making under uncertainty. This review outlines the foundational models and assumptions of bandit problems, explores non-asymptotic theoretical tools like concentration inequalities and minimax regret bounds, and compares frequentist and Bayesian algorithms for managing exploration-exploitation trade-offs. Additionally, we explore K-armed contextual bandits and SCAB, focusing on their methodologies and regret analyses. We also examine the connections between SCAB problems and functional data analysis. Finally, we highlight recent advances and ongoing challenges in the field.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · AI in Service Interactions · Forecasting Techniques and Applications