Optimal subsampling for the Cox proportional hazards model with massive survival data
Nan Qiao, Wangcheng Li, Feng Xiao, Cunjie Lin, Yong Zhou

TL;DR
This paper introduces an optimal subsampling method for the Cox proportional hazards model tailored for massive survival datasets, improving computational efficiency while maintaining statistical accuracy.
Contribution
It develops a new subsampling algorithm with explicit optimal probabilities, ensuring consistency and asymptotic normality for large-scale survival data analysis.
Findings
The method approximates full dataset estimators effectively.
Simulation studies confirm efficiency and accuracy.
Practical applications demonstrate real-world advantages.
Abstract
The use of massive survival data has become common in survival analysis. In this study, a subsampling algorithm is proposed for the Cox proportional hazards model with time-dependent covariates when the sample is extraordinarily large but computing resources are relatively limited. A subsample estimator is developed by maximizing the weighted partial likelihood; it is shown to have consistency and asymptotic normality. By minimizing the asymptotic mean squared error of the subsample estimator, the optimal subsampling probabilities are formulated with explicit expressions. Simulation studies show that the proposed method can satisfactorily approximate the estimator of the full dataset. The proposed method is then applied to corporate loan and breast cancer datasets, with different censoring rates, and the outcomes confirm its practical advantages.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Statistical Methods and Bayesian Inference · Liver Disease Diagnosis and Treatment
