On Adaptive Estimation for Dynamic Bernoulli Bandits
Xue Lu, Niall Adams, Nikolas Kantas

TL;DR
This paper introduces adaptive estimation techniques for dynamic Bernoulli bandits, enhancing existing algorithms like $psilon$-Greedy, UCB, and Thompson sampling to better track changing reward distributions without prior knowledge.
Contribution
The paper proposes simple, adaptive versions of standard bandit algorithms that effectively handle non-stationary reward environments without requiring prior information.
Findings
Adaptive algorithms outperform traditional methods in dynamic settings.
The new methods are easy to implement and do not need prior knowledge.
Numerical results demonstrate significant improvements in tracking changing rewards.
Abstract
The multi-armed bandit (MAB) problem is a classic example of the exploration-exploitation dilemma. It is concerned with maximising the total rewards for a gambler by sequentially pulling an arm from a multi-armed slot machine where each arm is associated with a reward distribution. In static MABs, the reward distributions do not change over time, while in dynamic MABs, each arm's reward distribution can change, and the optimal arm can switch over time. Motivated by many real applications where rewards are binary, we focus on dynamic Bernoulli bandits. Standard methods like -Greedy and Upper Confidence Bound (UCB), which rely on the sample mean estimator, often fail to track changes in the underlying reward for dynamic problems. In this paper, we overcome the shortcoming of slow response to change by deploying adaptive estimation in the standard methods and propose a new family…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Machine Learning and Algorithms
