Variable Selection for Stratified Sampling Designs in Semiparametric Accelerated Failure Time Models with Clustered Failure Times
Ying Chen, Chuan-Fa Tang, Sy Han Chiou, Min Chen

TL;DR
This paper develops a new regularized estimation method within the GEE framework for variable selection in stratified sampling designs with clustered failure times in semiparametric AFT models, improving inference accuracy.
Contribution
It introduces a reliable penalized GEE-based estimator for variable selection in stratified, clustered failure time data, with proven theoretical properties and practical advantages.
Findings
Outperforms existing methods ignoring sampling bias or dependence.
Achieves oracle property in variable selection.
Demonstrates effectiveness through simulation and dental study applications.
Abstract
In large-scale epidemiological studies, statistical inference is often complicated by high-dimensional covariates under stratified sampling designs for failure times. Variable selection methods developed for full cohort data do not extend naturally to stratified sampling designs, and appropriate adjustments for the sampling scheme are necessary. Further challenges arise when the failure times are clustered and exhibit within-cluster dependence. As an alternative of Cox proportional hazards (PH) model when the PH assumption is not valid, the penalized Buckley-James (BJ) estimating method for accelerated failure time (AFT) models can potentially handle within-cluster correlation in such setting by incorporating generalized estimating equation (GEE) techniques, though its practical implementation remains hindered by computational instability. We propose a regularized estimating method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
