# Exploring the recurrence and metastasis of breast invasive ductal carcinoma based on machine learning and survival analysis

**Authors:** Aqiao Xu, Xiaobo Weng, Jing Zheng, Qian Cui, Gaoyan He, Yongran Cheng, Haitao Jiang, Mingzhu Wei, Shengjian Zhang

PMC · DOI: 10.3389/fonc.2026.1734379 · Frontiers in Oncology · 2026-03-13

## TL;DR

This study uses machine learning to predict recurrence and metastasis in breast invasive ductal carcinoma patients, improving risk assessment and treatment decisions.

## Contribution

The study introduces a novel machine learning-based model for predicting recurrence and metastasis in IDC with high accuracy across multiple datasets.

## Key findings

- XGBoost model achieved high AUCs (0.842-0.912) and 93.8% accuracy in predicting recurrence and metastasis.
- Rad-score, Ki-67 index, lymph node metastasis, and tumor grade were significant predictors of recurrence-free survival.
- Patients with family history showed distinct metastasis patterns, particularly to bone.

## Abstract

Invasive ductal carcinoma (IDC), the predominant histopathological subtype comprising about 80% of breast malignancies, continues to pose a significant clinical challenge due to frequent recurrence. Existing relapse prediction models remain limited in accuracy and generalizability. This study aimed to construct and validate machine learning–based models for predicting 5-year (short- to medium term) recurrence and metastasis risk in IDC, based on recurrence-free survival (RFS) analysis.

A total of 640 IDC cases diagnosed between January 2017 and December 2019 were enrolled, data were partitioned into three sets: the training set (n = 303) from Fudan University Shanghai Cancer Center; the validation set (n = 217) from Shaoxing Central Hospital; and the test set (n = 120) from Zhejiang Cancer Hospital. Independent prognostic factors were identified through univariate and multivariate Cox regression analyses. Three predictive strategies were implemented: evaluating recurrence risk, distinguishing local from distant recurrence, and identifying metastatic sites. Light Gradient Boosting Machine (LGBM), XGBoost (XGB), Random Forest (RF), k-Nearest Neighbor (KNN), Neural Network (NN), and Support Vector Machine (SVM) were trained and validated.

The median follow-up duration was 5.7 years. Multivariate Cox regression analyses identified multiple factors significantly associated with RFS, including the rad-score, Ki-67 index, lymph node metastasis, tumor histological grade, and breast cancer family history in first- or second-degree relatives (all p < 0.05). In contrast, age, menopausal status, and molecular subtype showed no significant association with recurrence risk in this cohort (p = 0.987, p = 0.987, and p = 0.960, respectively). The clinical-radiomic nomogram demonstrated strong in predictive IDC recurrence. The XGBoost model demonstrated robust and consistent predictive performance across all cohorts, achieving AUCs of 0.842, 0.848, and 0.912 on the training, validation, and test sets, respectively. On the independent test set, the model attained an accuracy of 93.8%, sensitivity of 96.3%, and specificity of 79.6%.Furthermore, density plots of the radiomic score and Ki-67 index effectively differentiated between local recurrence, bone metastasis, and metastases to other organs. Patients with lymph node metastasis and high histological grade demonstrated a higher frequency of metastases to distant organs, accounting for most cases and emphasizing the contrast with local recurrence and bone metastasis. Patients with a breast cancer family history displayed a distinct pattern of bone metastasis.

This study underscores the utility of machine learning models in forecasting recurrence and metastatic behavior in IDC. The clinical-radiomic nomograms proved valuable for individualized surgical and therapeutic decision-making in IDC patients.

## Linked entities

- **Diseases:** breast invasive ductal carcinoma (MONDO:0004953), breast cancer (MONDO:0004989)

## Full-text entities

- **Diseases:** bone metastasis (MESH:D009362), IDC (MESH:D044584), breast cancer (MESH:D001943), lymph node metastasis (MESH:D008207), breast invasive ductal carcinoma (MESH:D018270), Cancer (MESH:D009369)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13021447/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13021447/full.md

## References

38 references — full list in the complete paper: https://tomesphere.com/paper/PMC13021447/full.md

---
Source: https://tomesphere.com/paper/PMC13021447