# Non-inferiority of automated deep learning-based [18F]FDG PET/CT tumour volume compared to manual GTV for prognostic modelling in head and neck cancer

**Authors:** David G. Kovacs, Katrin Håkansson, Jacob Rasmussen, Barbara M. Fischer, Flemming L. Andersen, Claes N. Ladefoged

PMC · DOI: 10.1186/s13550-026-01377-0 · EJNMMI Research · 2026-02-06

## TL;DR

AI-generated tumor volumes from PET scans are as good as manual ones for predicting cancer outcomes in head and neck cancer patients.

## Contribution

This study shows that AI-based tumor volume measurements are non-inferior to manual ones for prognostic modeling in head and neck cancer.

## Key findings

- AI-PET-GTV and manual GTV had nearly identical AUCs for predicting loco-regional failure and distant metastasis at 1 and 3 years.
- Brier scores favored AI-PET-GTV at both 1 and 3 years.
- Cumulative incidence estimates were nearly identical between AI and manual models in high-risk groups.

## Abstract

Manual segmentation of gross tumour volumes (GTV) on [18F]FDG PET/CT is time-consuming and subject to interobserver variability, limiting its scalability for prognostic modelling in head and neck cancer. We investigated whether deep learning-based PET tumour volumes (AI-PET-GTV) could replace manually defined GTVs in risk prediction models for loco-regional failure (LRF) and distant metastasis (DM).

Using competing risk regression, we tested whether AI-PET-GTV was non-inferior to manual GTV in predicting LRF, with the primary outcome being area under the receiver operating characteristic curve (AUC) at 3 years, using a non-inferiority margin of 5 percentage points. AI-PET-GTV achieved a 3-year AUC of 72.9% (95% CI: 67.9–77.9%) compared to 72.8% (95% CI: 67.8–77.9%) for manual GTV (p = 0.02). At 1 year, AUCs were 77.3% (95% CI: 72.2–82.4%) and 76.9% (95% CI: 71.9–82.0%) for AI and manual GTV, respectively (p = 0.02). Similar patterns were observed for DM prediction at 1 and 3 years (all p < 0.01), and Brier scores also favoured AI-PET-GTV at both timepoints (p < 0.02). Stratification based on predicted risk yielded nearly identical cumulative incidence estimates. For example, the 3-year cumulative incidence of LRF in the high-risk group was 38.4% (95% CI: 32.6–44.2%) for both models.

Automated deep learning-based PET tumour volumes are non-inferior to manual GTVs for prognostic modelling of LRF and DM in head and neck cancer. These findings support clinical implementation of AI-derived volumes for reproducible, scalable, and earlier risk stratification in oncology workflows.

The online version contains supplementary material available at 10.1186/s13550-026-01377-0.

## Linked entities

- **Chemicals:** [18F]FDG (PubChem CID 68614)
- **Diseases:** head and neck cancer (MONDO:0005627)

## Full-text entities

- **Diseases:** head and neck cancer (MESH:D006258), tumour (MESH:D009369)
- **Chemicals:** [18F]FDG (MESH:D019788)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12972335/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12972335/full.md

---
Source: https://tomesphere.com/paper/PMC12972335