High-Dimensional Variable Selection and Prediction under Competing Risks with Application to SEER-Medicare Linked Data
Jiayi Hou, Anthony Paravati, Ronghui Xu, James Murphy

TL;DR
This paper evaluates high-dimensional variable selection and prediction methods for competing risks models, specifically cause-specific and subdistribution hazards, using simulations and real SEER-Medicare data for prostate cancer mortality prediction.
Contribution
It explores the statistical properties and predictive accuracy of existing methods in high-dimensional settings for competing risks, with application to cancer mortality data.
Findings
Optimal methods improve prediction accuracy
Variable selection performance varies across approaches
Application to SEER-Medicare data demonstrates practical utility
Abstract
Competing risk analysis considers event times due to multiple causes, or of more than one event types. Commonly used regression models for such data include 1) cause-specific hazards model, which focuses on modeling one type of event while acknowledging other event types simultaneously; and 2) subdistribution hazards model, which links the covariate effects directly to the cumulative incidence function. Their use and in particular statistical properties in the presence of high-dimensional predictors are largely unexplored. Motivated by an analysis using the linked SEER-Medicare database for the purposes of predicting cancer versus non-cancer mortality for patients with prostate cancer, we study the accuracy of prediction and variable selection of existing statistical learning methods under both models using extensive simulation experiments, including different approaches to choosing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Statistical Methods in Clinical Trials · Machine Learning in Healthcare
