Identifying the Best Arm in the Presence of Global Environment Shifts
Phurinut Srisawad, Juergen Branke, Long Tran-Thanh

TL;DR
This paper introduces a new problem setting for best-arm identification in non-stationary bandits affected by global environmental shifts, proposing novel policies that outperform existing methods in practice.
Contribution
The paper formulates a unique non-stationary bandit problem with global shifts and develops robust selection and allocation policies tailored to this setting.
Findings
Proposed policies outperform existing methods in empirical tests.
New algorithms effectively handle global environmental shifts.
Significant improvement over traditional bandit solutions.
Abstract
This paper formulates a new Best-Arm Identification problem in the non-stationary stochastic bandits setting, where the means of all arms are shifted in the same way due to a global influence of the environment. The aim is to identify the unique best arm across environmental change given a fixed total budget. While this setting can be regarded as a special case of Adversarial Bandits or Corrupted Bandits, we demonstrate that existing solutions tailored to those settings do not fully utilise the nature of this global influence, and thus, do not work well in practice (despite their theoretical guarantees). To overcome this issue, in this paper we develop a novel selection policy that is consistent and robust in dealing with global environmental shifts. We then propose an allocation policy, LinLUCB, which exploits information about global shifts across all arms in each environment.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Forecasting Techniques and Applications
