Q-learning with censored data
Yair Goldberg, Michael R. Kosorok

TL;DR
This paper introduces a novel Q-learning algorithm tailored for multistage decision problems with censored survival data, enabling personalized treatment strategies in clinical trials with theoretical guarantees.
Contribution
It develops a censored-data-adjusted Q-learning method with finite sample bounds and convergence guarantees for multistage survival decision problems.
Findings
Algorithm effectively handles censored survival data.
Converges to optimal policy with sufficient data.
Applicable to personalized medicine trials.
Abstract
We develop methodology for a multistage decision problem with flexible number of stages in which the rewards are survival times that are subject to censoring. We present a novel Q-learning algorithm that is adjusted for censored data and allows a flexible number of stages. We provide finite sample bounds on the generalization error of the policy learned by the algorithm, and show that when the optimal Q-function belongs to the approximation space, the expected survival time for policies obtained by the algorithm converges to that of the optimal policy. We simulate a multistage clinical trial with flexible number of stages and apply the proposed censored-Q-learning algorithm to find individualized treatment regimens. The methodology presented in this paper has implications in the design of personalized medicine trials in cancer and in other life-threatening diseases.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
