# A Unified Framework for Survival Prediction: Combining Machine Learning Feature Selection with Traditional Survival Analysis in Heart Failure and METABRIC Breast Cancer

**Authors:** Fangya Tan, Jian-Guo Zhou, Shuqiao Li, Bowen Long, Srikar Bellur, Yang Zhou, Mark Newman

PMC · DOI: 10.3390/diagnostics16050790 · Diagnostics · 2026-03-06

## TL;DR

This paper introduces a framework that combines machine learning with traditional survival analysis to improve clinical risk prediction in heart failure and breast cancer.

## Contribution

The novel contribution is a unified framework that integrates ML feature selection with interpretable survival analysis for robust and generalizable risk stratification.

## Key findings

- In heart failure, age, serum creatinine, and blood pressure identified high-risk patients with significantly higher mortality.
- In breast cancer, age at diagnosis, HER2 status, and NPI predicted survival with a 12.3-year life expectancy difference between risk groups.
- The framework achieved stable performance across datasets with C-index values consistent with clinical benchmarks.

## Abstract

Background: The clinical use of machine learning (ML) in survival analysis is often limited by the “black box” nature of complex algorithms, which makes their results difficult to interpret in practice. In this study, we propose a unified and clinically grounded framework that integrates ML-based feature selection with traditional survival analysis. This approach aims to bridge the gap between strong predictive performance and clear, clinically meaningful interpretation. Methods: High-impact prognostic clinical features were identified using ML models GBM-Cox, RSF, and LASSO-Cox with 5-fold stratified cross-validation and subsequently validated using Cox Proportional Hazards and Kaplan–Meier analysis. The framework was evaluated across two distinct disease domains, Heart Failure and the METABRIC breast cancer cohort, to assess robustness and generalizability. Results: In the Heart Failure dataset, age group, serum creatinine, and blood pressure stratified patients into distinct risk groups. The high-risk group had significantly higher mortality (HR: 2.61; 95% CI: 1.42–4.78; p = 0.0013). In the METABRIC cohort, age at diagnosis, HER2 status, and Nottingham Prognostic Index (NPI) showed strong survival separation (p < 0.001). The high-risk group had an HR of 2.73 (95% CI: 2.34–3.19) and the faced a significantly shorter median survival (104.7 vs. 252.3 months), representing a 12.3-year reduction in life expectancy compared to low-risk group. This prognostic separation emphasizes the predictive power of selected baseline variables. Performance remained stable across cohorts, with C-index values (0.665–0.794) consistent with standard clinical benchmarks. Conclusions: Integrating cross-validated machine learning feature selection with Cox-based survival analysis enables stable and clinically interpretable risk stratification across diseases. By translating ML selected predictors into hazard ratios and absolute survival differences, this framework provides a reproducible and clinically grounded approach for survival risk assessment.

## Linked entities

- **Diseases:** heart failure (MONDO:0005252), breast cancer (MONDO:0004989)

## Full-text entities

- **Genes:** ERBB2 (erb-b2 receptor tyrosine kinase 2) [NCBI Gene 2064] {aka CD340, HER-2, HER-2/neu, HER2, MLN 19, MLN-19}
- **Diseases:** Heart Failure (MESH:D006333), Breast Cancer (MESH:D001943)
- **Chemicals:** creatinine (MESH:D003404)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12985148/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12985148/full.md

## References

54 references — full list in the complete paper: https://tomesphere.com/paper/PMC12985148/full.md

---
Source: https://tomesphere.com/paper/PMC12985148