An Instrumental Variable Approach to Confounded Off-Policy Evaluation
Yang Xu, Jin Zhu, Chengchun Shi, Shikai Luo, and Rui Song

TL;DR
This paper introduces an instrumental variable approach for off-policy evaluation in confounded Markov decision processes, enabling consistent value estimation despite unmeasured confounders, with demonstrated effectiveness through simulations and real-world data.
Contribution
It develops a novel IV-based method for confounded OPE in MDPs, providing consistent estimates where existing methods fail due to unmeasured confounders.
Findings
The proposed estimator is robust and efficient.
The method achieves accurate value estimates in simulations.
Real data analysis confirms practical effectiveness.
Abstract
Off-policy evaluation (OPE) is a method for estimating the return of a target policy using some pre-collected observational data generated by a potentially different behavior policy. In some cases, there may be unmeasured variables that can confound the action-reward or action-next-state relationships, rendering many existing OPE approaches ineffective. This paper develops an instrumental variable (IV)-based method for consistent OPE in confounded Markov decision processes (MDPs). Similar to single-stage decision making, we show that IV enables us to correctly identify the target policy's value in infinite horizon settings as well. Furthermore, we propose an efficient and robust value estimator and illustrate its effectiveness through extensive simulations and analysis of real data from a world-leading short-video platform.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Causal Inference Techniques
