Nearly Optimal Best Arm Identification for Semiparametric Bandits
Seok-Jin Kim

TL;DR
This paper introduces a nearly optimal algorithm for fixed-confidence best arm identification in semiparametric bandits, addressing an open problem with theoretical guarantees and empirical validation.
Contribution
It presents a new phase-elimination algorithm based on an $XY$-design, achieving near-optimal sample complexity for semiparametric bandits.
Findings
The algorithm attains a nearly optimal high-probability sample complexity bound.
Experiments demonstrate significant improvements over prior methods.
Theoretical analysis matches the linear-bandit complexity on shifted features.
Abstract
We study fixed-confidence Best Arm Identification (BAI) in semiparametric bandits, where rewards are linear in arm features plus an unknown additive baseline shift. Unlike linear-bandit BAI, this setting requires orthogonalized regression, and its instance-optimal sample complexity has remained open. For the transductive setting, we establish an attainable instance-dependent lower bound characterized by the corresponding linear-bandit complexity on shifted features. We then propose a computationally efficient phase-elimination algorithm based on a new -design for orthogonalized regression. Our analysis yields a nearly optimal high-probability sample-complexity upper bound, up to log factors and an additive term, and experiments on synthetic instances and the Jester dataset show clear gains over prior baselines.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
