# A Community Benchmark for the Automated Segmentation of Pediatric Neuroblastoma on Multi-Modal MRI: Design and Results of the SPPIN Challenge at MICCAI 2023

**Authors:** Myrthe A. D. Buser, Dominique C. Simons, Matthijs Fitski, Marc H. W. A. Wijnen, Annemieke S. Littooij, Annemiek H. ter Brugge, Iris N. Vos, Markus H. A. Janse, Mathijs de Boer, Rens ter Maat, Junya Sato, Shoji Kido, Satoshi Kondo, Satoshi Kasai, Marek Wodzinski, Henning Müller, Jin Ye, Junjun He, Yannick Kirchhoff, Maximilian R. Rokkus, Gao Haokai, Matías Fernández-Patón, Diana Veiga-Canuto, David G. Ellis, Michele Aizenberg, Bas H. M. van der Velden, Hugo Kuijf, Alberto de Luca, Alida F. W. van der Steeg

PMC · DOI: 10.3390/bioengineering12111157 · 2025-10-26

## TL;DR

This paper introduces a benchmark challenge for automatically segmenting pediatric neuroblastoma tumors in MRI scans to improve surgical planning.

## Contribution

The paper presents the first segmentation challenge in extracranial pediatric oncology and evaluates deep learning methods for tumor segmentation.

## Key findings

- Nine teams participated, with a wide variation in performance metrics like Dice score and HD95.
- The top team used a pre-trained model and achieved a median Dice score of 0.82.
- Pre-operative tumor segmentations showed significantly lower scores, suggesting limitations in current methods.

## Abstract

Surgery plays a key role in treating neuroblastoma. To assist surgical planning, anatomical 3D models derived from the segmentation of anatomical structures on MRI scans are often used. Automation using deep learning can make segmentations less time-consuming and more reliable. We organized the Surgical Planning in PedIatric Neuroblastoma (SPPIN) challenge, to stimulate developments and benchmarking of automatic segmentation of neuroblastoma on MRI. SPPIN is the first segmentation challenge in extracranial pediatric oncology. Nine teams provided a valid submission. Evaluation was based on the Dice similarity coefficient (Dice score), the 95th percentile of the Hausdorff distance (HD95), and the volumetric similarity (VS). A combination of these scores determined the ranking of the teams. The spread in the median evaluation scores per team was large (Dice: 0.21–0.82; HD95: 63.31–7.69; VS: 0.31–0.91). The top-performing team achieved a median Dice score of 0.82 (with an HD95 of 7.69 mm and a VS of 0.91) using a large, pre-trained model. However, in the pre-operative segmentations, significantly lower evaluation scores were observed. Our results indicate that pre-training might be useful in small, pediatric datasets. Although the general results of the winning team were high, they were insufficient to use for surgical planning in small, pre-operative tumors.

## Linked entities

- **Diseases:** neuroblastoma (MONDO:0005072)

## Full-text entities

- **Diseases:** Neuroblastoma (MESH:D009447), tumors (MESH:D009369)

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12649702/full.md

---
Source: https://tomesphere.com/paper/PMC12649702