Data-Centric Diet: Effective Multi-center Dataset Pruning for Medical Image Segmentation
Yongkang He, Mingjin Chen, Zhijing Yang, Yongyi Lu

TL;DR
This paper introduces a novel data pruning method for medical image segmentation that considers training dynamics, effectively reducing dataset size without compromising accuracy, and addressing the limitations of existing importance metrics.
Contribution
It proposes a new data importance measure based on Dynamic Average Dice scores for dense labeling tasks, pioneering this approach in medical image analysis.
Findings
Pruning datasets using DAD scores maintains segmentation accuracy.
Existing gradient norm metrics fail to identify important samples in dense labeling.
The proposed method provides a simple, effective baseline for data selection in medical segmentation.
Abstract
This paper seeks to address the dense labeling problems where a significant fraction of the dataset can be pruned without sacrificing much accuracy. We observe that, on standard medical image segmentation benchmarks, the loss gradient norm-based metrics of individual training examples applied in image classification fail to identify the important samples. To address this issue, we propose a data pruning method by taking into consideration the training dynamics on target regions using Dynamic Average Dice (DAD) score. To the best of our knowledge, we are among the first to address the data importance in dense labeling tasks in the field of medical image analysis, making the following contributions: (1) investigating the underlying causes with rigorous empirical analysis, and (2) determining effective data pruning approach in dense labeling problems. Our solution can be used as a strong…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Medical Image Segmentation Techniques · Image Retrieval and Classification Techniques
Methodsfail · Pruning
