TL;DR
This paper critically analyzes over 100 retinal vessel segmentation studies, identifies inconsistencies in performance reporting related to the field of view, and proposes numerical methods to correct biases for a more accurate ranking of algorithms.
Contribution
It introduces a systematic approach to detect and eliminate evaluation biases in retinal vessel segmentation benchmarks, improving the reliability of algorithm comparisons.
Findings
Most published rankings are based on non-comparable scores.
The highest accuracy achieved is 0.9582, close to human performance.
Evaluation biases significantly affect reported performance metrics.
Abstract
In the last 15 years, the segmentation of vessels in retinal images has become an intensively researched problem in medical imaging, with hundreds of algorithms published. One of the de facto benchmarking data sets of vessel segmentation techniques is the DRIVE data set. Since DRIVE contains a predefined split of training and test images, the published performance results of the various segmentation techniques should provide a reliable ranking of the algorithms. Including more than 100 papers in the study, we performed a detailed numerical analysis of the coherence of the published performance scores. We found inconsistencies in the reported scores related to the use of the field of view (FoV), which has a significant impact on the performance scores. We attempted to eliminate the biases using numerical techniques to provide a more realistic picture of the state of the art. Based on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
