# Artificial intelligence for TNM staging in NSCLC: a critical appraisal of segmentation utility in [1⁸F]FDG PET/CT

**Authors:** Maurice M. Heimer, Jakob Dexl, Johanna Ta, Ricarda Ebner, Felix L. Herr, Leon Orasanin, Katharina Jeblick, Lisa C. Adams, Lalith K. Shiyam Sundar, Amanda Tufman, Rudolf A. Werner, Gabriel Sheikh, Jens Ricke, Michael Ingrisch, Matthias P. Fabritius, Clemens C. Cyran

PMC · DOI: 10.1007/s00259-025-07677-2 · European Journal of Nuclear Medicine and Molecular Imaging · 2025-11-23

## TL;DR

This study evaluates an AI model for staging non-small cell lung cancer using PET/CT scans, finding it detects tumors well but overpredicts cancer spread, leading to inaccurate staging.

## Contribution

The study introduces a clinically driven error analysis framework to assess AI segmentation performance in cancer staging.

## Key findings

- The AI model achieved 95.8% lesion sensitivity but had a high rate of false positive M-category lesions.
- 35.7% of false positives were benign, and 34.7% were non-oncologic pathologies.
- Only 67.7% of UICC stagings were accurate using AI masks, indicating the need for manual review.

## Abstract

This study aims to investigate whether a diagnostic AI model can effectively support lesion detection and staging in non-small cell lung cancer (NSCLC) [1⁸F]FDG PET/CT studies, focusing on the distinction between technical segmentation accuracy and clinically meaningful performance.

In this retrospective single-centre study, [1⁸F]FDG PET/CT scans from 306 treatment-naïve NSCLC patients were reviewed with reference to multidisciplinary team decisions. Tumour lesions were manually segmented for reference and compared with predictions from the top-performing algorithm of the autoPET III challenge. Quantitative segmentation metrics were calculated, and lesion-level errors were assessed for impact on patient-level TNM and UICC staging.

The algorithm achieved a mean Dice Similarity Coefficient (DSC) of 0.64. Lesion-level sensitivity was 95.8% across all patients, with a precision of 87.5%. False positive M-category lesions (n = 196) occurred as most frequent error. Of all false positives, 35.7% were benign and 34.7% non-oncologic pathologies. UICC staging matched ground truth in 207/306 patients, with most discordances due to upstaging (88/306).

Clinically driven metrics and cause-based error analysis offer valuable insight into AI segmentation performance. The evaluated model showed excellent lesion sensitivity but a tendency towards systematic overprediction across TNM categories. On a lesion level M-stage false positives and undersegmentation in the hilar region emerged as the main driver of clinically relevant upstaging. Despite promising lesion detection sensitivity, only 67.7% UICC-stagings were accurate using AI masks, indicating that diagnostic AI may support, though not yet replace, manual lesion evaluation in NSCLC [1⁸F]FDG PET/CT.

The online version contains supplementary material available at 10.1007/s00259-025-07677-2.

## Linked entities

- **Diseases:** non-small cell lung cancer (MONDO:0005233), NSCLC (MONDO:0005233)

## Full-text entities

- **Genes:** TENM1 (teneurin transmembrane protein 1) [NCBI Gene 10178] {aka ODZ1, ODZ3, TEN-M1, TEN1, TNM, TNM1}
- **Diseases:** NSCLC (MESH:D002289), Tumour lesions (MESH:D009369)
- **Chemicals:** [18F]FDG (MESH:D019788)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13013355/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13013355/full.md

## References

1 references — full list in the complete paper: https://tomesphere.com/paper/PMC13013355/full.md

---
Source: https://tomesphere.com/paper/PMC13013355