Sequential Monte Carlo Bandits
I\~nigo Urteaga, Chris H. Wiggins

TL;DR
This paper introduces a novel approach using sequential Monte Carlo methods to extend Bayesian multi-armed bandit algorithms, enabling effective decision-making in complex, non-stationary, and nonlinear reward environments.
Contribution
It develops SMC-based Bayesian bandit algorithms that handle nonlinear, non-stationary, and context-dependent reward distributions, surpassing limitations of traditional methods.
Findings
Demonstrates good regret performance in non-stationary, nonlinear bandit scenarios.
Addresses complex bandit problems previously considered intractable.
Shows effectiveness of SMC methods in dynamic, real-world settings.
Abstract
We extend Bayesian multi-armed bandit (MAB) algorithms beyond their original setting by making use of sequential Monte Carlo (SMC) methods. A MAB is a sequential decision making problem where the goal is to learn a policy that maximizes long term payoff, where only the reward of the executed action is observed. In the stochastic MAB, the reward for each action is generated from an unknown distribution, often assumed to be stationary. To decide which action to take next, a MAB agent must learn the characteristics of the unknown reward distribution, e.g., compute its sufficient statistics. However, closed-form expressions for these statistics are analytically intractable except for simple, stationary cases. We here utilize SMC for estimation of the statistics Bayesian MAB agents compute, and devise flexible policies that can address a rich class of bandit problems: i.e., MABs with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Gaussian Processes and Bayesian Inference
