Cats, not CAT scans: a study of dataset similarity in transfer learning   for 2D medical image classification

Irma van den Brandt; Floris Fok; Bas Mulders; Joaquin Vanschoren,; Veronika Cheplygina

arXiv:2107.05940·cs.CV·July 14, 2021·6 cites

Cats, not CAT scans: a study of dataset similarity in transfer learning for 2D medical image classification

Irma van den Brandt, Floris Fok, Bas Mulders, Joaquin Vanschoren,, Veronika Cheplygina

PDF

Open Access 1 Repo

TL;DR

This study systematically compares various source datasets for transfer learning in 2D medical image classification, revealing that ImageNet performs best but dataset size and similarity are complex factors influencing transfer success.

Contribution

It provides a comprehensive analysis of source dataset effects on transfer learning performance in 2D medical imaging, challenging assumptions about dataset size and similarity.

Findings

01

ImageNet yields the highest classification performance.

02

Larger datasets are not always better for transfer learning.

03

Common notions of data similarity may be misleading.

Abstract

Transfer learning is a commonly used strategy for medical image classification, especially via pretraining on source data and fine-tuning on target data. There is currently no consensus on how to choose appropriate source data, and in the literature we can find both evidence of favoring large natural image datasets such as ImageNet, and evidence of favoring more specialized medical datasets. In this paper we perform a systematic study with nine source datasets with natural or medical images, and three target medical datasets, all with 2D images. We find that ImageNet is the source leading to the highest performances, but also that larger datasets are not necessarily better. We also study different definitions of data similarity. We show that common intuitions about similarity may be inaccurate, and therefore not sufficient to predict an appropriate source a priori. Finally, we discuss…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vcheplygina/cats-scans
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · COVID-19 diagnosis using AI · AI in cancer detection