# Integrative Machine Learning Model for Overall Survival Prediction in Breast Cancer Using Clinical and Transcriptomic Data

**Authors:** Mehmet Kivrak, Hatice Sevim Nalkiran, Oguzhan Kesen, Ihsan Nalkiran

PMC · DOI: 10.3390/biology14111539 · 2025-11-03

## TL;DR

This study uses machine learning to improve survival predictions for Luminal A breast cancer by combining clinical and genetic data, showing better accuracy than traditional methods.

## Contribution

An integrative machine learning model combining clinical and transcriptomic data for improved survival prediction in Luminal A breast cancer.

## Key findings

- XGBoost achieved the highest performance with 98% accuracy in predicting survival.
- Age-related gene expression differences were identified, impacting survival outcomes.
- Combining clinical and genomic variables improved prognostic accuracy compared to conventional methods.

## Abstract

Breast cancer is the most common cancer in women, and the Luminal A type is usually linked to better survival. However, age and menopause can affect how the disease behaves and how patients respond to treatment. In this study, we looked at both genetic information from tumors and clinical features such as age, tumor size, and treatments. Women with Luminal A breast cancer were divided into younger, older, and elderly groups. We found that gene activity differed between these groups and that some genes and clinical features were closely related to survival. By using computer-based learning methods, we created models that combined both genetic and clinical data. These models predicted survival more accurately than traditional methods. Our results suggest that, in future, considering both age-related genetic changes and clinical features may help doctors make better treatment decisions and improve outcomes for women with this type of breast cancer.

Breast cancer is the most common malignancy in women, with the Luminal A subtype generally associated with favorable survival. However, age and menopausal status may influence tumor biology and prognosis. To improve prediction beyond conventional models, we analyzed transcriptomic and clinical data from the METABRIC cohort. Patients with Luminal A breast cancer were stratified into premenopausal, postmenopausal–nongeriatric, and geriatric (≥70 years) groups. Differentially expressed genes (DEGs) were identified, and Boruta feature selection revealed 27 clinical and genomic variables. Random Forest, Logistic Regression, Multilayer Perceptron, and ensemble XGBoost models were trained with stratified 5-fold cross-validation, using SMOTE to correct class imbalance. Principal component analysis showed distinct clustering across age groups, while DEG analysis revealed 41 genes associated with age and survival. Key predictors included clinical variables (age, tumor size, NPI, radiotherapy) and molecular markers (ATM, HERC2, AKT2, FOXO3, CYP3A43). Among ML models, XGBoost demonstrated the highest performance (accuracy 98%, sensitivity 98%, specificity 97%, F1-score 0.99, AUC 0.86), outperforming other algorithms. These findings indicate that age-related transcriptomic changes impact survival in Luminal A breast cancer and that an ML-based integrative approach combining clinical and molecular variables provides superior prognostic accuracy, supporting its potential for clinical application.

## Linked entities

- **Genes:** ATM (ATM serine/threonine kinase) [NCBI Gene 472], HERC2 (HECT and RLD domain containing E3 ubiquitin protein ligase 2) [NCBI Gene 8924], AKT2 (AKT serine/threonine kinase 2) [NCBI Gene 208], FOXO3 (forkhead box O3) [NCBI Gene 2309], CYP3A43 (cytochrome P450 family 3 subfamily A member 43) [NCBI Gene 64816]
- **Diseases:** breast cancer (MONDO:0004989), Luminal A breast cancer (MONDO:0021116)

## Full-text entities

- **Genes:** ATM (ATM serine/threonine kinase) [NCBI Gene 472] {aka AT1, ATA, ATC, ATD, ATDC, ATE}, HERC2 (HECT and RLD domain containing E3 ubiquitin protein ligase 2) [NCBI Gene 8924] {aka D15F37S1, MRT38, SHEP1, jdf2, p528}, CYP3A43 (cytochrome P450 family 3 subfamily A member 43) [NCBI Gene 64816], FOXO3 (forkhead box O3) [NCBI Gene 2309] {aka AF6q21, FKHRL1, FKHRL1P2, FOXO2, FOXO3A}, AKT2 (AKT serine/threonine kinase 2) [NCBI Gene 208] {aka HIHGHH, PKBB, PKBBETA, PRKBB, RAC-BETA}
- **Diseases:** Breast Cancer (MESH:D001943), malignancy (MESH:D009369)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12650249/full.md

---
Source: https://tomesphere.com/paper/PMC12650249