Unbiased Estimation of the Value of an Optimized Policy
Elon Portugaly, Joseph J. Pfeiffer III

TL;DR
This paper introduces a method to learn optimized policies from A/B test data while providing unbiased estimates of their value, crucial for high-stakes decision-making, by using bagging and out-of-bag techniques.
Contribution
It proposes a novel unbiased estimation procedure for the value of optimized policies using bagging and out-of-bag methods, applicable to any policy learning approach.
Findings
Unbiased policy value estimation is achievable with the proposed method.
The method can identify positive policies even when average treatment effects are negative.
Empirical results demonstrate the effectiveness of the unbiased estimator.
Abstract
Randomized trials, also known as A/B tests, are used to select between two policies: a control and a treatment. Given a corresponding set of features, we can ideally learn an optimized policy P that maps the A/B test data features to action space and optimizes reward. However, although A/B testing provides an unbiased estimator for the value of deploying B (i.e., switching from policy A to B), direct application of those samples to learn the the optimized policy P generally does not provide an unbiased estimator of the value of P as the samples were observed when constructing P. In situations where the cost and risks associated of deploying a policy are high, such an unbiased estimator is highly desirable. We present a procedure for learning optimized policies and getting unbiased estimates for the value of deploying them. We wrap any policy learning procedure with a bagging process…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Causal Inference Techniques · Statistical Methods and Inference · Statistical Methods in Clinical Trials
