Deeply-Debiased Off-Policy Interval Estimation
Chengchun Shi, Runzhe Wan, Victor Chernozhukov, Rui Song

TL;DR
This paper introduces a novel method for constructing reliable confidence intervals in off-policy evaluation, enhancing the quantification of uncertainty in policy value estimates using a deep debiasing approach.
Contribution
It proposes a deeply-debiased procedure for off-policy confidence interval estimation, combining theoretical justification with practical implementation.
Findings
The method provides robust and efficient confidence intervals.
The approach is validated through numerical experiments.
The procedure is flexible and theoretically sound.
Abstract
Off-policy evaluation learns a target policy's value with a historical dataset generated by a different behavior policy. In addition to a point estimate, many applications would benefit significantly from having a confidence interval (CI) that quantifies the uncertainty of the point estimate. In this paper, we propose a novel deeply-debiasing procedure to construct an efficient, robust, and flexible CI on a target policy's value. Our method is justified by theoretical results and numerical experiments. A Python implementation of the proposed procedure is available at https://github.com/RunzheStat/D2OPE.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Machine Learning and Algorithms
