Off-Policy Evaluation with Irregularly-Spaced, Outcome-Dependent Observation Times
Xin Chen, Wenbin Lu, Shu Yang, Dipankar Bandyopadhyay

TL;DR
This paper introduces a new off-policy evaluation framework for decision-making processes with irregular, outcome-dependent observation times, relevant for real-world scenarios like user visits and health records.
Contribution
It develops a novel OPE framework that accounts for irregular, outcome-dependent decision times, connecting Markov decision processes with renewal processes and providing statistical inference methods.
Findings
Framework effectively models irregular observation times.
Theoretical validation of the proposed methods.
Successful application to electronic health records.
Abstract
While the classic off-policy evaluation (OPE) literature commonly assumes decision time points to be evenly spaced for simplicity, in many real-world scenarios, such as those involving user-initiated visits, decisions are made at irregularly-spaced and potentially outcome-dependent time points. For a more principled evaluation of the dynamic policies, this paper constructs a novel OPE framework, which concerns not only the state-action process but also an observation process dictating the time points at which decisions are made. The framework is closely connected to the Markov decision process in computer science and with the renewal process in the statistical literature. Within the framework, two distinct value functions, derived from cumulative reward and integrated reward respectively, are considered, and statistical inference for each value function is developed under revised Markov…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHealthcare Operations and Scheduling Optimization
