Off-Policy Evaluation and Learning for Survival Outcomes under Censoring
Kohsuke Kubota, Mitsuhiro Takahashi, Yuta Saito

TL;DR
This paper introduces new off-policy evaluation and learning methods for survival outcomes with censoring, ensuring unbiased and robust estimates crucial for high-stakes decision-making.
Contribution
It develops IPCW-IPS and IPCW-DR estimators that handle censoring bias and prove their unbiasedness and double robustness, advancing survival analysis in off-policy evaluation.
Findings
Proposed estimators are unbiased under censoring.
IPCW-DR achieves double robustness.
Methods outperform existing approaches in simulations and real data.
Abstract
Optimizing survival outcomes, such as patient survival or customer retention, is a critical objective in data-driven decision-making. Off-Policy Evaluation~(OPE) provides a powerful framework for assessing such decision-making policies using logged data alone, without the need for costly or risky online experiments in high-stakes applications. However, typical estimators are not designed to handle right-censored survival outcomes, as they ignore unobserved survival times beyond the censoring time, leading to systematic underestimation of the true policy performance. To address this issue, we propose a novel framework for OPE and Off-Policy Learning~(OPL) tailored for survival outcomes under censoring. Specifically, we introduce IPCW-IPS and IPCW-DR, which employ the Inverse Probability of Censoring Weighting technique to explicitly deal with censoring bias. We theoretically establish…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Causal Inference Techniques · Advanced Bandit Algorithms Research · Statistical Methods and Inference
