Investigating the Quality of DermaMNIST and Fitzpatrick17k Dermatological Image Datasets
Kumar Abhishek, Aditi Jain, Ghassan Hamarneh

TL;DR
This paper critically examines dermatological image datasets DermaMNIST, HAM10000, and Fitzpatrick17k, identifying data quality issues that affect model benchmarking and proposing corrections to improve dataset reliability.
Contribution
It provides a detailed analysis of data quality problems in popular dermatological datasets and offers corrections to enhance their reliability for deep learning applications.
Findings
Presence of duplicates and data leakage identified
Mislabeled images and test set issues uncovered
Corrections improve dataset quality and benchmarking accuracy
Abstract
The remarkable progress of deep learning in dermatological tasks has brought us closer to achieving diagnostic accuracies comparable to those of human experts. However, while large datasets play a crucial role in the development of reliable deep neural network models, the quality of data therein and their correct usage are of paramount importance. Several factors can impact data quality, such as the presence of duplicates, data leakage across train-test partitions, mislabeled images, and the absence of a well-defined test partition. In this paper, we conduct meticulous analyses of three popular dermatological image datasets: DermaMNIST, its source HAM10000, and Fitzpatrick17k, uncovering these data quality issues, measure the effects of these problems on the benchmark results, and propose corrections to the datasets. Besides ensuring the reproducibility of our analysis, by making our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCutaneous Melanoma Detection and Management · Autoimmune and Inflammatory Disorders · Nonmelanoma Skin Cancer Studies
