Adaptive Experimental Design for Policy Learning
Masahiro Kato, Kyohei Okumura, Takuya Ishihara, Toru Kitagawa

TL;DR
This paper introduces an adaptive experimental design method called PLAS for contextual best arm identification, achieving minimax optimality in worst-case regret for policy learning.
Contribution
It develops a new adaptive sampling strategy that is proven to be minimax rate-optimal in the contextual bandit setting.
Findings
PLAS matches the lower bound for simple regret in the worst case.
The strategy is proven to be minimax rate-optimal as the number of units grows.
The approach effectively identifies the best treatment with theoretical guarantees.
Abstract
This study investigates the contextual best arm identification (BAI) problem, aiming to design an adaptive experiment to identify the best treatment arm conditioned on contextual information (covariates). We consider a decision-maker who assigns treatment arms to experimental units during an experiment and recommends the estimated best treatment arm based on the contexts at the end of the experiment. The decision-maker uses a policy for recommendations, which is a function that provides the estimated best treatment arm given the contexts. In our evaluation, we focus on the worst-case expected regret, a relative measure between the expected outcomes of an optimal policy and our proposed policy. We derive a lower bound for the expected simple regret and then propose a strategy called Adaptive Sampling-Policy Learning (PLAS). We prove that this strategy is minimax rate-optimal in the sense…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Causal Inference Techniques · Auction Theory and Applications · Advanced Bandit Algorithms Research
