# Machine learning-driven PET-CT and clinical pathology model for predicting mediastinal lymph node metastasis in non-small cell lung cancer: a retrospective cohort study

**Authors:** Taiyu Bi, Min Qiang, Xiaotian Duan, Yipeng Yin, Wenyu Zhang, Zhe Chen, Xinjun Zhang, Jianzun Ma, Bowei Zhang, Mingbo Tang, Wei Liu

PMC · DOI: 10.7717/peerj.20788 · PeerJ · 2026-02-03

## TL;DR

This study uses machine learning with PET-CT scans and clinical data to predict lymph node metastasis in lung cancer patients, showing promising accuracy.

## Contribution

A novel machine learning model (TLPC with XGBoost) combining PET-CT imaging and clinical data improves prediction of mediastinal lymph node metastasis in NSCLC.

## Key findings

- The TLPC model achieved an AUC of 0.90, with high sensitivity and specificity for predicting MLNM.
- XGBoost outperformed other machine learning models in predicting metastasis.
- PET-CT imaging features combined with clinical data offer a non-invasive approach for metastasis detection.

## Abstract

This study aims to evaluate whether Positron Emission Tomography–Computed Tomography (PET-CT) imaging features of primary tumors and lymph nodes, combined with clinical and pathological data, can accurately predict mediastinal lymph node metastasis (MLNM) in resectable non-small cell lung cancer (NSCLC) using machine learning models.

A retrospective study was conducted on 390 NSCLC patients who underwent tumor resection and lymph node dissection between January 2017 and December 2023. All patients received 18F-fluorodeoxyglucose (18F-FDG) PET-CT scans within two weeks before surgery. Data from 390 primary tumors and 1,026 lymph node stations were analyzed. Clinical and PET-CT imaging features were extracted, and feature selection was performed using a random forest algorithm. Eight machine learning models were evaluated, including Logistic Regression, classification and regression tree (CART), support vector machine (SVM), gradient boosting decision tree (GBDT), Random Forest, multi-layer perceptron (MLP), extreme gradient boosting tree (XGBoost) and k-nearest neighbor algorithm (KNN).

Tumor-Pathology-Clinical (TPC), Lymph-Pathology-Clinical (LPC), and Tumor-Lymph-Pathology-Clinical (TLPC). Model performance was assessed using Receiver Operating Characteristic (ROC) curves, Decision Curve Analysis (DCA), and confusion matrices.

The TLPC model, based on the XGBoost algorithm, showed the best performance, with an Area Under the Curve (AUC) of 0.90 (95% CI [0.883–0.957]), specificity of 0.84, and sensitivity of 0.96 (P = 0.0069; significant at P < 0.05). In comparison, the TPC model achieved an AUC of 0.67 (95% CI [0.647–0.703]), specificity of 0.46, and sensitivity of 0.56 (P = 0.7037; not significant). The LPC model showed intermediate performance, with an AUC of 0.78 (95% CI [0.713–0.751]), specificity of 0.73, and sensitivity of 0.84 (P = 0.0269; significant at P < 0.05). All P-values were derived from DeLong’s test comparing AUCs between models, with statistical significance defined as P < 0.05. Of the 1,026 lymph node stations analyzed, 204 showed metastasis, while 822 did not. XGBoost consistently outperformed other models in predicting MLNM.

Combining PET-CT imaging features of primary tumors and lymph nodes with clinical and pathological data shows promise for accurately predicting MLNM in NSCLC. The TLPC model offers a non-invasive method for identifying lymph node metastasis, supporting personalized treatment strategies. However, since PET-CT was performed selectively rather than routinely acquired, external validation across diverse clinical settings is warranted to confirm model generalizability.

## Linked entities

- **Chemicals:** 18F-fluorodeoxyglucose (PubChem CID 68614), 18F-FDG (PubChem CID 68614)
- **Diseases:** non-small cell lung cancer (MONDO:0005233)

## Full-text entities

- **Diseases:** metastasis (MESH:D009362), Tumor (MESH:D009369), MLNM (MESH:D008207), NSCLC (MESH:D002289)
- **Chemicals:** 18F-FDG (MESH:D019788)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12880095/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12880095/full.md

## References

41 references — full list in the complete paper: https://tomesphere.com/paper/PMC12880095/full.md

---
Source: https://tomesphere.com/paper/PMC12880095