Experimental Design for Semiparametric Bandits

Seok-Jin Kim; Gi-Soo Kim; Min-hwan Oh

arXiv:2506.13390·stat.ML·June 18, 2025

Experimental Design for Semiparametric Bandits

Seok-Jin Kim, Gi-Soo Kim, Min-hwan Oh

PDF

Open Access

TL;DR

This paper introduces a novel experimental-design method for semiparametric bandits that achieves optimal regret bounds, including minimax and logarithmic regret, by refining non-asymptotic analysis of orthogonalized regression.

Contribution

It presents the first approach combining sharp regret, PAC, and best-arm guarantees for semiparametric bandits, generalizing classical linear bandit methods.

Findings

01

Attains minimax regret $ ilde{O}( ext{sqrt}(dT))$ matching lower bounds.

02

Achieves logarithmic regret under positive suboptimality gap.

03

Provides refined non-asymptotic analysis of orthogonalized regression.

Abstract

We study finite-armed semiparametric bandits, where each arm's reward combines a linear component with an unknown, potentially adversarial shift. This model strictly generalizes classical linear bandits and reflects complexities common in practice. We propose the first experimental-design approach that simultaneously offers a sharp regret bound, a PAC bound, and a best-arm identification guarantee. Our method attains the minimax regret $\tilde{O} (d T)$ , matching the known lower bound for finite-armed linear bandits, and further achieves logarithmic regret under a positive suboptimality gap condition. These guarantees follow from our refined non-asymptotic analysis of orthogonalized regression that attains the optimal $d$ rate, paving the way for robust and efficient learning across a broad class of semiparametric bandit problems.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research