Approximating Partial Likelihood Estimators via Optimal Subsampling
Haixiang Zhang, Lulu Zuo, HaiYing Wang, Liuquan Sun

TL;DR
This paper introduces a fast subsampling approach to approximate the maximum partial likelihood estimator in Cox's model, significantly reducing computational time for large-scale survival data while maintaining statistical properties.
Contribution
It develops an optimal subsampling method with explicit probabilities and a practical two-step algorithm for efficient large-scale survival analysis.
Findings
Subsampling estimator is consistent and asymptotically normal.
Optimal subsampling probabilities minimize the asymptotic variance.
The method significantly reduces computation time in real data applications.
Abstract
With the growing availability of large-scale biomedical data, it is often time-consuming or infeasible to directly perform traditional statistical analysis with relatively limited computing resources at hand. We propose a fast subsampling method to effectively approximate the full data maximum partial likelihood estimator in Cox's model, which largely reduces the computational burden when analyzing massive survival data. We establish consistency and asymptotic normality of a general subsample-based estimator. The optimal subsampling probabilities with explicit expressions are determined via minimizing the trace of the asymptotic variance-covariance matrix for a linearly transformed parameter estimator. We propose a two-step subsampling algorithm for practical implementation, which has a significant reduction in computing time compared to the full data method. The asymptotic properties…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Bayesian Inference · Statistical Methods and Inference · Markov Chains and Monte Carlo Methods
