Sequential Monte Carlo Bandits

I\~nigo Urteaga; Chris H. Wiggins

arXiv:1808.02933·stat.ML·April 8, 2024·1 cites

Sequential Monte Carlo Bandits

I\~nigo Urteaga, Chris H. Wiggins

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel approach using sequential Monte Carlo methods to extend Bayesian multi-armed bandit algorithms, enabling effective decision-making in complex, non-stationary, and nonlinear reward environments.

Contribution

It develops SMC-based Bayesian bandit algorithms that handle nonlinear, non-stationary, and context-dependent reward distributions, surpassing limitations of traditional methods.

Findings

01

Demonstrates good regret performance in non-stationary, nonlinear bandit scenarios.

02

Addresses complex bandit problems previously considered intractable.

03

Shows effectiveness of SMC methods in dynamic, real-world settings.

Abstract

We extend Bayesian multi-armed bandit (MAB) algorithms beyond their original setting by making use of sequential Monte Carlo (SMC) methods. A MAB is a sequential decision making problem where the goal is to learn a policy that maximizes long term payoff, where only the reward of the executed action is observed. In the stochastic MAB, the reward for each action is generated from an unknown distribution, often assumed to be stationary. To decide which action to take next, a MAB agent must learn the characteristics of the unknown reward distribution, e.g., compute its sufficient statistics. However, closed-form expressions for these statistics are analytically intractable except for simple, stationary cases. We here utilize SMC for estimation of the statistics Bayesian MAB agents compute, and devise flexible policies that can address a rich class of bandit problems: i.e., MABs with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

iurteaga/bandits
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Gaussian Processes and Bayesian Inference