# Assessing the robustness and clinical evaluation of a deep−learning segmentation model for head and neck cancer

**Authors:** Daniel H. Schanne, Léandre Cuenot, Sarah Brüningk, Mauricio Reyes, Olgun Elicin

PMC · DOI: 10.3389/fonc.2026.1731007 · Frontiers in Oncology · 2026-02-13

## TL;DR

This study evaluates how well a deep learning model segments tumors in head and neck cancer under various imaging challenges.

## Contribution

The study introduces a robustness evaluation framework for DL-based segmentation models in head and neck cancer using synthetic image perturbations.

## Key findings

- Baseline Dice scores for GTVp and GTVn were 0.766 and 0.698, respectively.
- Clinical usability for GTVn dropped to 27.9% under severe image perturbations.
- High PET contrast helped mitigate some perturbation effects.

## Abstract

Deep learning (DL)-based autosegmentation has improved delineation of organs at risk in radiotherapy for head and neck cancer (HNC). However, automated segmentation of gross tumor volumes (GTVp, GTVn) remains challenging, and robustness under real-world imaging conditions is insufficiently characterized. This study evaluates the robustness and clinical usability of a DL-based PET/CT segmentation model for HNC under clinically relevant perturbations.

A 3D Dynamic U-Net was trained on the public HECKTOR 2022 dataset (474 training, 50 test cases). Synthetic perturbations (noise, blur, ghosting, bias-field, spike noise, and motion) were applied to PET and CT images at varying severity levels, generating 36 variants per patient. Segmentation quality was measured using Dice score, Hausdorff Distance, and accuracy. Clinical usability was assessed for 50 baseline and 18 perturbed cases by two clinicians using a five-point Likert scale. Radiomic features were correlated with robustness metrics.

Baseline Dice scores were 0.766 (GTVp) and 0.698 (GTVn). Performance dropped significantly under spike noise and bias-field artifacts, especially for GTVn. Clinical usability remained high for GTVp (77.8%) but declined to 27.9% for GTVn under severe perturbations. Lesion volume and surface complexity positively correlated with robustness degradation, while high PET contrast offered protective effects against certain perturbations.

DL-based PET/CT segmentation models for HNC show strong baseline performance and robustness for primary tumors. However, nodal tumor segmentation remains vulnerable to specific image artifacts. Enhancing robustness through targeted data augmentation and validation under variable conditions is essential for clinical integration.

## Linked entities

- **Diseases:** head and neck cancer (MONDO:0005627)

## Full-text entities

- **Diseases:** brain tumor (MESH:D001932), MR (MESH:D008944), lesion (MESH:D009059), DL (MESH:D007859), Cancer (MESH:D009369), oropharyngeal cancer (MESH:D009959), HNC (MESH:D006258), nodal (MESH:D013611), nodal metastases (MESH:D009362)
- **Chemicals:** FDG (MESH:D019788), GTVn (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12945774/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12945774/full.md

## References

27 references — full list in the complete paper: https://tomesphere.com/paper/PMC12945774/full.md

---
Source: https://tomesphere.com/paper/PMC12945774