Q-learning with censored data

Yair Goldberg; Michael R. Kosorok

arXiv:1205.6659·math.ST·May 31, 2012

Q-learning with censored data

Yair Goldberg, Michael R. Kosorok

PDF

TL;DR

This paper introduces a novel Q-learning algorithm tailored for multistage decision problems with censored survival data, enabling personalized treatment strategies in clinical trials with theoretical guarantees.

Contribution

It develops a censored-data-adjusted Q-learning method with finite sample bounds and convergence guarantees for multistage survival decision problems.

Findings

01

Algorithm effectively handles censored survival data.

02

Converges to optimal policy with sufficient data.

03

Applicable to personalized medicine trials.

Abstract

We develop methodology for a multistage decision problem with flexible number of stages in which the rewards are survival times that are subject to censoring. We present a novel Q-learning algorithm that is adjusted for censored data and allows a flexible number of stages. We provide finite sample bounds on the generalization error of the policy learned by the algorithm, and show that when the optimal Q-function belongs to the approximation space, the expected survival time for policies obtained by the algorithm converges to that of the optimal policy. We simulate a multistage clinical trial with flexible number of stages and apply the proposed censored-Q-learning algorithm to find individualized treatment regimens. The methodology presented in this paper has implications in the design of personalized medicine trials in cancer and in other life-threatening diseases.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.