Evaluating Machine Learning-based Skin Cancer Diagnosis
Tanish Jain

TL;DR
This study assesses the reliability, explainability, and fairness of deep learning models for skin cancer detection using dermatoscopic images, highlighting strengths and disparities across skin tones and lesion types.
Contribution
It provides a comprehensive evaluation of two CNN models' performance, explainability, and fairness, and introduces a postprocessing method to improve fairness across skin tones.
Findings
Both models highlight relevant features for most lesion types.
Models show fairness across sex but disparities across skin tones.
Postprocessing improves fairness by reducing false negative rate differences.
Abstract
This study evaluates the reliability of two deep learning models for skin cancer detection, focusing on their explainability and fairness. Using the HAM10000 dataset of dermatoscopic images, the research assesses two convolutional neural network architectures: a MobileNet-based model and a custom CNN model. Both models are evaluated for their ability to classify skin lesions into seven categories and to distinguish between dangerous and benign lesions. Explainability is assessed using Saliency Maps and Integrated Gradients, with results interpreted by a dermatologist. The study finds that both models generally highlight relevant features for most lesion types, although they struggle with certain classes like seborrheic keratoses and vascular lesions. Fairness is evaluated using the Equalized Odds metric across sex and skin tone groups. While both models demonstrate fairness across sex…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCutaneous Melanoma Detection and Management
