# Development and validation of a multifactorial risk prediction model for breast cancer patients with co-occurring thyroid cancer: a retrospective matched case-control study

**Authors:** Junming Yin, Zhiwei Guo, Wen Yi, Ying He, Yi Luo, Kepeng Zhu, Songlin Yuan, Guocheng Du

PMC · DOI: 10.3389/fonc.2026.1772910 · 2026-03-17

## TL;DR

This study developed a machine learning model to predict the risk of thyroid cancer in breast cancer patients, using factors like radiotherapy history and hormone levels.

## Contribution

A novel XGBoost-based risk prediction model for co-occurring thyroid cancer in breast cancer patients was developed and validated.

## Key findings

- XGBoost model achieved high AUC (0.874) and accuracy (86.7%) in predicting thyroid cancer co-occurrence.
- Radiotherapy history, elevated TSH, ER-positive status, family history of thyroid cancer, and younger age were significant risk factors.
- The model showed enhanced performance in patients with a history of radiotherapy (AUC = 0.921).

## Abstract

To develop and validate a multifactorial machine learning model predicting thyroid cancer (TC) co-occurrence risk in breast cancer (BC) patients.

This single-center retrospective matched case-control study analyzed 400 BC patients (200 with co-occurring TC, 200 matched BC-only controls) diagnosed between 2012-2025. Predictors included demographic, clinical, hormonal, and tumor biological variables. After feature selection via LASSO regression to handle multicollinearity, four machine learning algorithms (logistic regression, random forest, XGBoost, SVM) were developed and optimized using Bayesian hyperparameter tuning with 5-fold cross-validation. Model performance was evaluated on a 30% independent test set using AUC-ROC, calibration curves, and decision curve analysis.

Multivariate analysis identified independent risk factors for TC co-occurrence: radiotherapy history (aOR = 3.42, 95% CI: 2.14–5.46), elevated TSH level (aOR = 2.01 per µIU/mL, 95% CI: 1.65–2.45), ER-positive status (aOR = 2.47, 95% CI: 1.43–4.28), family history of TC (aOR = 3.05, 95% CI: 1.55–6.00), and younger age at BC diagnosis (aOR = 1.07 per year decrease, 95% CI: 1.04–1.10). The XGBoost model demonstrated superior discrimination (test AUC = 0.874, 95% CI: 0.836–0.934) compared to other algorithms, with 86.7% accuracy, 83.3% sensitivity, and 90.0% specificity. Notably, subgroup analysis revealed enhanced predictive performance in patients with a history of radiotherapy (AUC = 0.921). Decision curve analysis confirmed clinical utility across threshold probabilities (20–80%), showing a superior net benefit for personalized risk stratification.

The XGBoost-based model integrates radiotherapy exposure, hormonal profiles, and tumor biology to stratify TC risk in BC patients. It offers a clinically applicable tool for personalized surveillance, balancing early detection with resource optimization. External validation is warranted before implementation.

## Linked entities

- **Diseases:** breast cancer (MONDO:0004989), thyroid cancer (MONDO:0002108)

## Full-text entities

- **Genes:** EREG (epiregulin) [NCBI Gene 2069] {aka EPR, ER, Ep}
- **Diseases:** tumor (MESH:D009369), TC (MESH:D013964), BC (MESH:D001943)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13035759/full.md

---
Source: https://tomesphere.com/paper/PMC13035759