Survival ensembles by the sum of pairwise differences with application to lung cancer microarray studies
Brent A. Johnson, Qi Long

TL;DR
This paper introduces a novel boosting-based high-dimensional survival analysis method using pairwise residual differences, applied to lung cancer microarray data, revealing more relevant genes than traditional proportional hazards models.
Contribution
It proposes a new ensemble method for survival analysis that outperforms PH models, especially when PH assumptions are violated, demonstrated on lung cancer microarray data.
Findings
Ensemble identified 19 genes related to lung cancer survival.
PH ensemble identified only 9 genes, a subset of the proposed method.
PH models tend to underfit and miss important covariate effects.
Abstract
Lung cancer is among the most common cancers in the United States, in terms of incidence and mortality. In 2009, it is estimated that more than 150,000 deaths will result from lung cancer alone. Genetic information is an extremely valuable data source in characterizing the personal nature of cancer. Over the past several years, investigators have conducted numerous association studies where intensive genetic data is collected on relatively few patients compared to the numbers of gene predictors, with one scientific goal being to identify genetic features associated with cancer recurrence or survival. In this note, we propose high-dimensional survival analysis through a new application of boosting, a powerful tool in machine learning. Our approach is based on an accelerated lifetime model and minimizing the sum of pairwise differences in residuals. We apply our method to a recent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
