Best Arm Identification with Possibly Biased Offline Data
Le Yang, Vincent Y. F. Tan, Wang Chi Cheung

TL;DR
This paper addresses the challenge of best arm identification using biased offline data, proposing an adaptive algorithm that balances offline and online data, with theoretical guarantees and empirical validation.
Contribution
Introduces LUCB-H, an adaptive algorithm for BAI with biased offline data, providing theoretical analysis and demonstrating improved performance over standard methods.
Findings
LUCB-H matches standard LUCB when offline data is misleading.
LUCB-H outperforms standard LUCB when offline data is helpful.
Numerical experiments confirm LUCB-H's robustness and adaptability.
Abstract
We study the best arm identification (BAI) problem with potentially biased offline data in the fixed confidence setting, which commonly arises in real-world scenarios such as clinical trials. We prove an impossibility result for adaptive algorithms without prior knowledge of the bias bound between online and offline distributions. To address this, we propose the LUCB-H algorithm, which introduces adaptive confidence bounds by incorporating an auxiliary bias correction to balance offline and online data within the LUCB framework. Theoretical analysis shows that LUCB-H matches the sample complexity of standard LUCB when offline data is misleading and significantly outperforms it when offline data is helpful. We also derive an instance-dependent lower bound that matches the upper bound of LUCB-H in certain scenarios. Numerical experiments further demonstrate the robustness and adaptability…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Adversarial Robustness in Machine Learning · Advanced Bandit Algorithms Research
