TL;DR
This study assesses how deep neural networks trained on clinical skin images perform across different skin tones, highlighting biases due to dataset imbalances and evaluating skin tone identification methods.
Contribution
It introduces Fitzpatrick skin type annotations for a large dermatology image dataset and analyzes model accuracy across skin types, addressing representation bias.
Findings
Models perform best on skin types similar to training data.
Significant underrepresentation of dark skin types in the dataset.
Algorithmic skin tone identification compares with human annotations.
Abstract
How does the accuracy of deep neural network models trained to classify clinical images of skin conditions vary across skin color? While recent studies demonstrate computer vision models can serve as a useful decision support tool in healthcare and provide dermatologist-level classification on a number of specific tasks, darker skin is underrepresented in the data. Most publicly available data sets do not include Fitzpatrick skin type labels. We annotate 16,577 clinical images sourced from two dermatology atlases with Fitzpatrick skin type labels and open-source these annotations. Based on these labels, we find that there are significantly more images of light skin types than dark skin types in this dataset. We train a deep neural network model to classify 114 skin conditions and find that the model is most accurate on skin types similar to those it was trained on. In addition, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
