# A Multimodal and Multi-centric Head and Neck Cancer Dataset for Segmentation, Diagnosis and Outcome Prediction

**Authors:** Numan Saeed, Salma Hassan, Shahad Hardan, Ahmed Aly, Darya Taratynova, Umair Nawaz, Ufaq Khan, Muhammad Ridzuan, Vincent Andrearczyk, Adrien Depeursinge, Yutong Xie, Thomas Eugene, Rapha\"el Metz, M\'elanie Dore, Gregory Delpon, Vijay Ram Kumar Papineni, Kareem Wahid, Cem Dede, Alaa Mohamed Shawky Ali, Carlos Sjogreen, Mohamed Naser, Clifton D. Fuller, Valentin Oreiller, Mario Jreige, John O. Prior, Catherine Cheze Le Rest, Olena Tankyevych, Pierre Decazes, Su Ruan, Stephanie Tanadini-Lang, Martin Valli\`eres, Hesham Elhalawani, Ronan Abgral, Romain Floch, Kevin Kerleguer, Ulrike Schick, Maelle Mauguen, David Bourhis, Jean-Christophe Leclere, Amandine Sambourg, Arman Rahmim, Mathieu Hatt, Mohammad Yaqub

arXiv: 2509.00367 · 2025-09-23

## TL;DR

This paper introduces a large, diverse multimodal head and neck cancer dataset with detailed annotations and clinical data, enabling advanced research in segmentation, diagnosis, and outcome prediction.

## Contribution

It provides a publicly available, multi-institutional dataset with comprehensive annotations and benchmarks for key clinical tasks using state-of-the-art deep learning models.

## Key findings

- Successful automated tumor segmentation using deep learning models
- Effective recurrence-free survival prediction with multimodal data
- Accurate HPV status classification leveraging clinical and imaging features

## Abstract

We present a publicly available multimodal dataset for head and neck cancer research, comprising 1123 annotated Positron Emission Tomography/Computed Tomography (PET/CT) studies from patients with histologically confirmed disease, acquired from 10 international medical centers. All studies contain co-registered PET/CT scans with varying acquisition protocols, reflecting real-world clinical diversity from a long-term, multi-institution retrospective collection. Primary gross tumor volumes (GTVp) and involved lymph nodes (GTVn) were manually segmented by experienced radiation oncologists and radiologists following established guidelines. We provide anonymized NifTi files, expert-annotated segmentation masks, comprehensive clinical metadata, and radiotherapy dose distributions for a patient subset. The metadata include TNM staging, HPV status, demographics, long-term follow-up outcomes, survival times, censoring indicators, and treatment information. To demonstrate its utility, we benchmark three key clinical tasks: automated tumor segmentation, recurrence-free survival prediction, and HPV status classification, using state-of-the-art deep learning models like UNet, SegResNet, and multimodal prognostic frameworks.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2509.00367/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/2509.00367/full.md

## References

24 references — full list in the complete paper: https://tomesphere.com/paper/2509.00367/full.md

---
Source: https://tomesphere.com/paper/2509.00367