Confidence Intervals for Performance Estimates in Brain MRI Segmentation
R. El Jurdi, G. Varoquaux, O. Colliot

TL;DR
This paper investigates how many test images are needed to reliably estimate performance in 3D brain MRI segmentation, showing that fewer samples are needed than in classification tasks, with implications for reporting confidence in medical imaging.
Contribution
It provides an analysis of confidence interval widths in brain MRI segmentation, demonstrating that fewer test samples are sufficient for reliable estimates compared to classification.
Findings
Confidence intervals can be approximated by parametric methods for segmentation.
Fewer test images are needed for a given precision in segmentation than in classification.
Typically, 100-200 samples suffice for a 1% confidence interval width in low-spread scenarios.
Abstract
Medical segmentation models are evaluated empirically. As such an evaluation is based on a limited set of example images, it is unavoidably noisy. Beyond a mean performance measure, reporting confidence intervals is thus crucial. However, this is rarely done in medical image segmentation. The width of the confidence interval depends on the test set size and on the spread of the performance measure (its standard-deviation across the test set). For classification, many test images are needed to avoid wide confidence intervals. Segmentation, however, has not been studied, and it differs by the amount of information brought by a given test image. In this paper, we study the typical confidence intervals in the context of segmentation in 3D brain magnetic resonance imaging (MRI). We carry experiments on using the standard nnU-net framework, two datasets from the Medical Decathlon challenge…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Radiomics and Machine Learning in Medical Imaging · Machine Learning and Data Classification
