Detecting and refurbishing ground truth errors during training of deep learning-based echocardiography segmentation models
Iman Islam, Bram Ruijsink, Andrew J. Reader, Andrew P. King

TL;DR
This paper investigates the robustness of deep learning models for echocardiography segmentation to label errors and introduces a method using Variance of Gradients to detect and refurbish erroneous ground truth labels during training.
Contribution
It proposes a novel error detection and refurbishment strategy during training, improving model robustness against ground truth label errors in medical image segmentation.
Findings
VOG effectively flags erroneous labels during training.
U-Net maintains performance under moderate label errors.
Refurbishing labels improves model accuracy under high-error conditions.
Abstract
Deep learning-based medical image segmentation typically relies on ground truth (GT) labels obtained through manual annotation, but these can be prone to random errors or systematic biases. This study examines the robustness of deep learning models to such errors in echocardiography (echo) segmentation and evaluates a novel strategy for detecting and refurbishing erroneous labels during model training. Using the CAMUS dataset, we simulate three error types, then compare a loss-based GT label error detection method with one based on Variance of Gradients (VOG). We also propose a pseudo-labelling approach to refurbish suspected erroneous GT labels. We assess the performance of our proposed approach under varying error levels. Results show that VOG proved highly effective in flagging erroneous GT labels during training. However, a standard U-Net maintained strong performance under random…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
