Behaviour Policy Estimation in Off-Policy Policy Evaluation: Calibration   Matters

Aniruddh Raghu; Omer Gottesman; Yao Liu; Matthieu Komorowski; Aldo; Faisal; Finale Doshi-Velez; Emma Brunskill

arXiv:1807.01066·cs.LG·July 11, 2018·24 cites

Behaviour Policy Estimation in Off-Policy Policy Evaluation: Calibration Matters

Aniruddh Raghu, Omer Gottesman, Yao Liu, Matthieu Komorowski, Aldo, Faisal, Finale Doshi-Velez, Emma Brunskill

PDF

Open Access

TL;DR

This paper investigates the importance of calibration in estimating behaviour policies for off-policy evaluation, demonstrating that simple non-parametric models can outperform neural networks in calibration and OPE accuracy.

Contribution

It highlights the critical role of calibration in behaviour policy estimation and shows that non-parametric models can yield better calibrated estimates for OPE.

Findings

01

Neural networks can produce highly uncalibrated behaviour policy models.

02

Non-parametric k-nearest neighbors models achieve better calibration.

03

Better calibration leads to more accurate importance sampling-based OPE.

Abstract

In this work, we consider the problem of estimating a behaviour policy for use in Off-Policy Policy Evaluation (OPE) when the true behaviour policy is unknown. Via a series of empirical studies, we demonstrate how accurate OPE is strongly dependent on the calibration of estimated behaviour policy models: how precisely the behaviour policy is estimated from data. We show how powerful parametric models such as neural networks can result in highly uncalibrated behaviour policy models on a real-world medical dataset, and illustrate how a simple, non-parametric, k-nearest neighbours model produces better calibrated behaviour policy estimates and can be used to obtain superior importance sampling-based OPE estimates.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Causal Inference Techniques · Economic Policies and Impacts