# Standardized Assessment of Automatic Segmentation of White Matter   Hyperintensities and Results of the WMH Segmentation Challenge

**Authors:** Hugo J. Kuijf, J. Matthijs Biesbroek, Jeroen de Bresser, Rutger, Heinen, Simon Andermatt, Mariana Bento, Matt Berseth, Mikhail Belyaev, M., Jorge Cardoso, Adri\`a Casamitjana, D. Louis Collins, Mahsa Dadar, Achilleas, Georgiou, Mohsen Ghafoorian, Dakai Jin, April Khademi, Jesse Knight, Hongwei, Li, Xavier Llad\'o, Miguel Luna, Qaiser Mahmood, Richard McKinley, Alireza, Mehrtash, S\'ebastien Ourselin, Bo-yong Park, Hyunjin Park, Sang Hyun Park,, Simon Pezold, Elodie Puybareau, Leticia Rittner, Carole H. Sudre, Sergi, Valverde, Ver\'onica Vilaplana, Roland Wiest, Yongchao Xu, Ziyue Xu, Guodong, Zeng, Jianguo Zhang, Guoyan Zheng, Christopher Chen, Wiesje van der Flier,, Frederik Barkhof, Max A. Viergever, Geert Jan Biessels

arXiv: 1904.00682 · 2019-04-02

## TL;DR

This paper presents a standardized challenge for evaluating automatic white matter hyperintensity segmentation methods across multiple MRI scanners, highlighting top-performing algorithms and their robustness to scanner variability.

## Contribution

It introduces a public benchmark and evaluation framework for WMH segmentation methods, enabling objective comparison and assessment of generalization across scanners.

## Key findings

- Four methods significantly outperform others.
- One method is identified as the clear winner.
- Not all methods generalize well to unseen scanners.

## Abstract

Quantification of cerebral white matter hyperintensities (WMH) of presumed vascular origin is of key importance in many neurological research studies. Currently, measurements are often still obtained from manual segmentations on brain MR images, which is a laborious procedure. Automatic WMH segmentation methods exist, but a standardized comparison of the performance of such methods is lacking. We organized a scientific challenge, in which developers could evaluate their method on a standardized multi-center/-scanner image dataset, giving an objective comparison: the WMH Segmentation Challenge (https://wmh.isi.uu.nl/).   Sixty T1+FLAIR images from three MR scanners were released with manual WMH segmentations for training. A test set of 110 images from five MR scanners was used for evaluation. Segmentation methods had to be containerized and submitted to the challenge organizers. Five evaluation metrics were used to rank the methods: (1) Dice similarity coefficient, (2) modified Hausdorff distance (95th percentile), (3) absolute log-transformed volume difference, (4) sensitivity for detecting individual lesions, and (5) F1-score for individual lesions. Additionally, methods were ranked on their inter-scanner robustness.   Twenty participants submitted their method for evaluation. This paper provides a detailed analysis of the results. In brief, there is a cluster of four methods that rank significantly better than the other methods, with one clear winner. The inter-scanner robustness ranking shows that not all methods generalize to unseen scanners.   The challenge remains open for future submissions and provides a public platform for method evaluation.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.00682/full.md

## Figures

87 figures with captions in the complete paper: https://tomesphere.com/paper/1904.00682/full.md

## References

69 references — full list in the complete paper: https://tomesphere.com/paper/1904.00682/full.md

---
Source: https://tomesphere.com/paper/1904.00682