A/B Testing and Best-arm Identification for Linear Bandits with Robustness to Non-stationarity
Zhihan Xiong, Romain Camilleri, Maryam Fazel, Lalit Jain, Kevin, Jamieson

TL;DR
This paper introduces a new algorithm for linear bandit best-arm identification in non-stationary environments, combining robustness to changing parameters with fast identification rates, outperforming existing methods in diverse settings.
Contribution
We propose the P1-RAGE algorithm that balances robustness to non-stationarity with rapid identification, filling a gap in existing linear bandit algorithms.
Findings
P1-RAGE maintains performance comparable to G-optimal design in worst cases.
The algorithm achieves faster identification rates in benign, stationary environments.
Empirical results show P1-RAGE outperforms existing algorithms across various scenarios.
Abstract
We investigate the fixed-budget best-arm identification (BAI) problem for linear bandits in a potentially non-stationary environment. Given a finite arm set , a fixed budget , and an unpredictable sequence of parameters , an algorithm will aim to correctly identify the best arm with probability as high as possible. Prior work has addressed the stationary setting where for all and demonstrated that the error probability decreases as for a problem-dependent constant . But in many real-world multivariate testing scenarios that motivate our work, the environment is non-stationary and an algorithm expecting a stationary setting can easily fail. For robust identification, it is well-known…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Advanced Bandit Algorithms Research · Advanced Statistical Process Monitoring
Methodsfail
