Less is More: Selective Reduction of CT Data for Self-Supervised Pre-Training of Deep Learning Models with Contrastive Learning Improves Downstream Classification Performance
Daniel Wolf, Tristan Payer, Catharina Silvia Lisson, Christoph Gerhard, Lisson, Meinrad Beer, Michael G\"otz, Timo Ropinski

TL;DR
This paper demonstrates that selectively reducing CT datasets based on information-theoretic strategies enhances contrastive self-supervised pre-training, leading to better downstream classification performance and faster training times in medical imaging.
Contribution
The study introduces a novel dataset reduction approach for contrastive pre-training that improves downstream task accuracy and efficiency in medical image analysis.
Findings
Dataset reduction improves AUC scores across multiple medical classification tasks.
Pre-training time is reduced by up to nine times with dataset reduction.
Selective dataset reduction enhances the effectiveness of contrastive learning in medical imaging.
Abstract
Self-supervised pre-training of deep learning models with contrastive learning is a widely used technique in image analysis. Current findings indicate a strong potential for contrastive pre-training on medical images. However, further research is necessary to incorporate the particular characteristics of these images. We hypothesize that the similarity of medical images hinders the success of contrastive learning in the medical imaging domain. To this end, we investigate different strategies based on deep embedding, information theory, and hashing in order to identify and reduce redundancy in medical pre-training datasets. The effect of these different reduction strategies on contrastive learning is evaluated on two pre-training datasets and several downstream classification tasks. In all of our experiments, dataset reduction leads to a considerable performance gain in downstream tasks,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsContrastive Learning
