Off-Policy Confidence Interval Estimation with Confounded Markov   Decision Process

Chengchun Shi; Jin Zhu; Ye Shen; Shikai Luo; Hongtu Zhu; Rui Song

arXiv:2202.10589·stat.ML·November 7, 2022

Off-Policy Confidence Interval Estimation with Confounded Markov Decision Process

Chengchun Shi, Jin Zhu, Ye Shen, Shikai Luo, Hongtu Zhu, Rui Song

PDF

1 Repo

TL;DR

This paper develops a method to accurately estimate confidence intervals for policy values in offline reinforcement learning settings where unmeasured confounders exist, using auxiliary variables to ensure identifiability.

Contribution

It introduces a novel approach for off-policy value estimation in confounded Markov decision processes, addressing a key gap in existing methods.

Findings

01

Method is robust to model misspecification

02

Provides rigorous uncertainty quantification

03

Validated on simulated and real ridesharing data

Abstract

This paper is concerned with constructing a confidence interval for a target policy's value offline based on a pre-collected observational data in infinite horizon settings. Most of the existing works assume no unmeasured variables exist that confound the observed actions. This assumption, however, is likely to be violated in real applications such as healthcare and technological industries. In this paper, we show that with some auxiliary variables that mediate the effect of actions on the system dynamics, the target policy's value is identifiable in a confounded Markov decision process. Based on this result, we develop an efficient off-policy value estimator that is robust to potential model misspecification and provide rigorous uncertainty quantification. Our method is justified by theoretical results, simulated and real datasets obtained from ridesharing companies. A Python…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mamba413/cope
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.