Online Survival Analysis: A Bandit Approach under Cox PH Model
Yang Xu, Wenbin Lu, Rui Song

TL;DR
This paper introduces an online bandit framework for survival analysis under the Cox PH model, addressing challenges like censoring and delayed feedback, with theoretical guarantees and empirical validation.
Contribution
It pioneers integrating survival analysis into online bandit algorithms under the Cox PH model, providing theoretical regret bounds and practical effectiveness.
Findings
Achieved sublinear regret bounds for online survival analysis algorithms.
Demonstrated rapid learning of near-optimal treatment policies in simulations.
Validated approach using SEER cancer data with effective results.
Abstract
Survival analysis is a widely used statistical framework for modeling time-to-event data under censoring. Classical methods, such as the Cox proportional hazards (Cox PH) model, offer a semiparametric approach to estimating the effects of covariates on the hazard function. Despite its importance, survival analysis has been largely unexplored in online settings, particularly within the bandit framework, where decisions must be made sequentially to optimize treatments as new data arrive over time. In this work, we take an initial step toward integrating survival analysis into a purely online learning setting under the Cox PH model, addressing key challenges including staggered entry, delayed feedback, and right censoring. We adapt three canonical bandit algorithms to balance exploration and exploitation, with theoretical guarantees of sublinear regret bounds. Extensive simulations and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
