# Leveraging Deep Learning, Grid Search, and Bayesian Networks to Predict Distant Recurrence of Breast Cancer

**Authors:** Xia Jiang, Yijun Zhou, Alan Wells, Adam Brufsky

PMC · DOI: 10.3390/cancers17152515 · Cancers · 2025-07-30

## TL;DR

This study uses AI to better predict when breast cancer might return years after treatment, improving accuracy and transparency for long-term patient care.

## Contribution

A novel interpretable machine learning pipeline combining Bayesian networks, deep learning, and grid search for improved long-term breast cancer recurrence prediction.

## Key findings

- The proposed method achieved AUC scores of 0.79, 0.83, and 0.89 for 5-, 10-, and 15-year recurrence predictions.
- Grid search improved model performance by 25.3% to 60% compared to baselines.
- MBIL reduced input dimensionality by over 80% without sacrificing accuracy.

## Abstract

Breast cancer can return even years after initial successful treatment, which makes predicting long-term recurrence very challenging. Currently available tools are not very accurate in predicting these late recurrences. In this study, we developed an advanced method using artificial intelligence to accurately predict whether breast cancer might recur at 5, 10, and 15 years after initial diagnosis. Our approach combines sophisticated techniques to identify the most relevant clinical factors, deep learning models to make precise predictions, and special methods to clearly explain how the predictions were made. By testing this method using existing medical records of breast cancer patients, we showed significantly improved prediction accuracy compared to some traditional methods. This approach can help clinicians better identify patients at high risk of recurrence and provide transparency in decision-making, potentially improving patient outcomes by guiding appropriate long-term monitoring and personalized treatment strategies.

Background: Unlike most cancers, breast cancer poses a persistent risk of distant recurrence—often years after initial treatment—making long-term risk stratification uniquely challenging. Current tools fall short in predicting late metastatic events, particularly for early-stage patients. Methods: We present an interpretable machine learning (ML) pipeline to predict distant recurrence-free survival at 5, 10, and 15 years, integrating Bayesian network-based causal feature selection, deep feed-forward neural network models (DNMs), and SHAP-based interpretation. Using electronic health record (EHR)-based clinical data from over 6000 patients, we first applied the Markov blanket and interactive risk factor learner (MBIL) to identify minimally sufficient predictor subsets. These were then used to train optimized DNM classifiers, with hyperparameters tuned via grid search and benchmarked against models from 10 traditional ML methods and models trained using all predictors. Results: Our best models achieved area under the curve (AUC) scores of 0.79, 0.83, and 0.89 for 5-, 10-, and 15-year predictions, respectively—substantially outperforming baselines. MBIL reduced input dimensionality by over 80% without sacrificing accuracy. Importantly, MBIL-selected features (e.g., nodal status, hormone receptor expression, tumor size) overlapped strongly with top SHAP contributors, reinforcing interpretability. Calibration plots further demonstrated close agreement between predicted probabilities and observed recurrence rates. The percentage performance improvement due to grid search ranged from 25.3% to 60%. Conclusions: This study demonstrates that combining causal selection, deep learning, and grid search improves prediction accuracy, transparency, and calibration for long-horizon breast cancer recurrence risk. The proposed framework is well-positioned for clinical use, especially to guide long-term follow-up and therapy decisions in early-stage patients.

## Linked entities

- **Diseases:** breast cancer (MONDO:0004989)

## Full-text entities

- **Genes:** SHROOM4 (shroom family member 4) [NCBI Gene 57477] {aka MRXSSDS, SHAP, shrm4}
- **Diseases:** cancers (MESH:D009369), nodal (MESH:D013611), Breast Cancer (MESH:D001943)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12346417/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12346417/full.md

## References

64 references — full list in the complete paper: https://tomesphere.com/paper/PMC12346417/full.md

---
Source: https://tomesphere.com/paper/PMC12346417