# Development and validation of a machine learning model for predicting high-risk distant metastatic recurrence in differentiated thyroid cancer

**Authors:** Fei Yang, Jie Zhang, Tengfei Liu, Zhijun Zhao

PMC · DOI: 10.3389/fmed.2026.1790226 · Frontiers in Medicine · 2026-03-09

## TL;DR

A machine learning model was developed to predict high-risk distant metastatic recurrence in thyroid cancer patients, offering better accuracy than current systems.

## Contribution

A novel XGBoost-based machine learning model for predicting high-risk distant metastatic recurrence in differentiated thyroid cancer.

## Key findings

- The XGBoost model achieved an AUC of 0.88 in predicting distant metastatic recurrence.
- Patients were stratified into low, intermediate, and high-risk groups with recurrence rates of 1.7%, 14.4%, and 64.1%, respectively.
- The model outperformed the TNM staging system in clinical utility and accuracy.

## Abstract

Distant metastatic recurrence significantly impacts the prognosis of patients with differentiated thyroid cancer (DTC). Current risk stratification systems have limited accuracy in predicting high-risk distant metastatic recurrence.

This study aimed to develop and validate a machine learning model for predicting high-risk distant metastatic recurrence in DTC patients.

We retrospectively analyzed 1,245 DTC patients treated between January 2020 and December 2024. Patients were randomly divided into training (n = 871) and validation (n = 374) sets. Forty-two clinical, pathological, molecular, and treatment-related variables were collected. LASSO regression was used for feature selection. Six machine learning algorithms (Random Forest, Support Vector Machine, XGBoost, Logistic Regression, K-Nearest Neighbors, and Decision Tree) were employed to build prediction models. Model performance was evaluated using AUC, accuracy, sensitivity, specificity, and F1-score. Calibration was assessed using calibration curves, and clinical utility was evaluated using decision curve analysis.

During a median follow-up of 72 months, 126 patients (10.1%) developed distant metastatic recurrence. LASSO regression identified eight predictors: age, tumor size, extrathyroidal extension, lymph node metastasis, BRAF V600E mutation, postoperative stimulated thyroglobulin (sTg) level, radioactive iodine dose, and TNM stage. The XGBoost model demonstrated the best performance, with an AUC of 0.88 (95% CI, 0.83–0.93) in the validation set. Patients were stratified into low-risk (recurrence rate: 1.7%), intermediate-risk (14.4%), and high-risk (64.1%) groups with significantly different distant metastasis-free survival (p < 0.001). The XGBoost model showed good calibration and superior clinical utility compared to the TNM staging system.

We developed and validated an XGBoost-based machine learning model that accurately predicts high-risk distant metastatic recurrence in DTC patients. This model may help clinicians identify patients who could benefit from more aggressive treatment and intensive follow-up, enabling personalized management strategies.

## Linked entities

- **Genes:** BRAF (B-Raf proto-oncogene, serine/threonine kinase) [NCBI Gene 673]
- **Diseases:** differentiated thyroid cancer (MONDO:0015447)

## Full-text entities

- **Genes:** BRAF (B-Raf proto-oncogene, serine/threonine kinase) [NCBI Gene 673] {aka B-RAF1, B-raf, BRAF-1, BRAF1, NS7, RAFB1}, TENM1 (teneurin transmembrane protein 1) [NCBI Gene 10178] {aka ODZ1, ODZ3, TEN-M1, TEN1, TNM, TNM1}, TG (thyroglobulin) [NCBI Gene 7038] {aka AITD3, TGN}
- **Diseases:** distant metastasis (MESH:D009362), lymph node metastasis (MESH:D008207), tumor (MESH:D009369), DTC (MESH:D013964)
- **Chemicals:** iodine (MESH:D007455)
- **Species:** Homo sapiens (human, species) [taxon 9606]
- **Mutations:** AUC of 0, V600E

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13006264/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13006264/full.md

## References

34 references — full list in the complete paper: https://tomesphere.com/paper/PMC13006264/full.md

---
Source: https://tomesphere.com/paper/PMC13006264