# A Weibull mixture cure frailty model for high-dimensional covariates

**Authors:** Fatih Kızılaslan, David Michael Swanson, Valeria Vitelli

PMC · DOI: 10.1177/09622802251327687 · Statistical Methods in Medical Research · 2025-03-31

## TL;DR

A new statistical model is developed to analyze survival data with a cured fraction and high-dimensional predictors, applied to breast cancer gene expression data.

## Contribution

The novel Weibull mixture cure frailty model integrates high-dimensional covariates and latent heterogeneity, with adaptive elastic-net penalization and EM-based inference.

## Key findings

- The proposed model outperforms existing methods in simulation studies.
- A set of prognostic biomarkers was identified and validated using RNAseq data from breast cancer patients.
- A risk score index based on the biomarkers was validated for patient survival prediction.

## Abstract

A novel mixture cure frailty model is introduced for handling censored survival data. Mixture cure models are preferable when the existence of a cured fraction among patients can be assumed. However, such models are heavily underexplored: frailty structures within cure models remain largely undeveloped, and furthermore, most existing methods do not work for high-dimensional datasets, when the number of predictors is significantly larger than the number of observations. In this study, we introduce a novel extension of the Weibull mixture cure model that incorporates a frailty component, employed to model an underlying latent population heterogeneity with respect to the outcome risk. Additionally, high-dimensional covariates are integrated into both the cure rate and survival part of the model, providing a comprehensive approach to employ the model in the context of high-dimensional omics data. We also perform variable selection via an adaptive elastic-net penalization, and propose a novel approach to inference using the expectation–maximization (EM) algorithm. Extensive simulation studies are conducted across various scenarios to demonstrate the performance of the model, and results indicate that our proposed method outperforms competitor models. We apply the novel approach to analyze RNAseq gene expression data from bulk breast cancer patients included in The Cancer Genome Atlas (TCGA) database. A set of prognostic biomarkers is then derived from selected genes, and subsequently validated via both functional enrichment analysis and comparison to the existing biological literature. Finally, a prognostic risk score index based on the identified biomarkers is proposed and validated by exploring the patients’ survival.

## Linked entities

- **Diseases:** breast cancer (MONDO:0004989)

## Full-text entities

- **Diseases:** breast cancer (MESH:D001943), Cancer (MESH:D009369)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12209551/full.md

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12209551/full.md

## References

70 references — full list in the complete paper: https://tomesphere.com/paper/PMC12209551/full.md

---
Source: https://tomesphere.com/paper/PMC12209551