Deeply-Debiased Off-Policy Interval Estimation

Chengchun Shi; Runzhe Wan; Victor Chernozhukov; Rui Song

arXiv:2105.04646·stat.ML·June 9, 2021·6 cites

Deeply-Debiased Off-Policy Interval Estimation

Chengchun Shi, Runzhe Wan, Victor Chernozhukov, Rui Song

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a novel method for constructing reliable confidence intervals in off-policy evaluation, enhancing the quantification of uncertainty in policy value estimates using a deep debiasing approach.

Contribution

It proposes a deeply-debiased procedure for off-policy confidence interval estimation, combining theoretical justification with practical implementation.

Findings

01

The method provides robust and efficient confidence intervals.

02

The approach is validated through numerical experiments.

03

The procedure is flexible and theoretically sound.

Abstract

Off-policy evaluation learns a target policy's value with a historical dataset generated by a different behavior policy. In addition to a point estimate, many applications would benefit significantly from having a confidence interval (CI) that quantifies the uncertainty of the point estimate. In this paper, we propose a novel deeply-debiasing procedure to construct an efficient, robust, and flexible CI on a target policy's value. Our method is justified by theoretical results and numerical experiments. A Python implementation of the proposed procedure is available at https://github.com/RunzheStat/D2OPE.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

RunzheStat/D2OPE
tfOfficial

Videos

Deeply-Debiased Off-Policy Interval Estimation· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Machine Learning and Algorithms