An Instrumental Variable Approach to Confounded Off-Policy Evaluation

Yang Xu; Jin Zhu; Chengchun Shi; Shikai Luo; and Rui Song

arXiv:2212.14468·stat.ML·February 3, 2023

An Instrumental Variable Approach to Confounded Off-Policy Evaluation

Yang Xu, Jin Zhu, Chengchun Shi, Shikai Luo, and Rui Song

PDF

Open Access 1 Video

TL;DR

This paper introduces an instrumental variable approach for off-policy evaluation in confounded Markov decision processes, enabling consistent value estimation despite unmeasured confounders, with demonstrated effectiveness through simulations and real-world data.

Contribution

It develops a novel IV-based method for confounded OPE in MDPs, providing consistent estimates where existing methods fail due to unmeasured confounders.

Findings

01

The proposed estimator is robust and efficient.

02

The method achieves accurate value estimates in simulations.

03

Real data analysis confirms practical effectiveness.

Abstract

Off-policy evaluation (OPE) is a method for estimating the return of a target policy using some pre-collected observational data generated by a potentially different behavior policy. In some cases, there may be unmeasured variables that can confound the action-reward or action-next-state relationships, rendering many existing OPE approaches ineffective. This paper develops an instrumental variable (IV)-based method for consistent OPE in confounded Markov decision processes (MDPs). Similar to single-stage decision making, we show that IV enables us to correctly identify the target policy's value in infinite horizon settings as well. Furthermore, we propose an efficient and robust value estimator and illustrate its effectiveness through extensive simulations and analysis of real data from a world-leading short-video platform.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

An Instrumental Variable Approach to Confounded Off-Policy Evaluation· slideslive

Taxonomy

TopicsAdvanced Causal Inference Techniques