# Deep neural network or dermatologist?

**Authors:** Kyle Young, Gareth Booth, Becks Simpson, Reuben Dutton, Sally, Shrapnel

arXiv: 1908.06612 · 2019-11-13

## TL;DR

This study evaluates the reliability of Grad-CAM and Kernel SHAP interpretability methods on deep neural networks trained for melanoma detection, revealing inconsistencies and non-relevant feature attributions despite high accuracy.

## Contribution

It systematically assesses the effectiveness of local interpretability methods on melanoma detection models using a reproducible framework and a large model suite.

## Key findings

- Models sometimes highlight irrelevant features.
- Different models with similar accuracy produce different explanations.
- Interpretability methods may not reliably reflect true model reasoning.

## Abstract

Deep learning techniques have proven high accuracy for identifying melanoma in digitised dermoscopic images. A strength is that these methods are not constrained by features that are pre-defined by human semantics. A down-side is that it is difficult to understand the rationale of the model predictions and to identify potential failure modes. This is a major barrier to adoption of deep learning in clinical practice. In this paper we ask if two existing local interpretability methods, Grad-CAM and Kernel SHAP, can shed light on convolutional neural networks trained in the context of melanoma detection. Our contributions are (i) we first explore the domain space via a reproducible, end-to-end learning framework that creates a suite of 30 models, all trained on a publicly available data set (HAM10000), (ii) we next explore the reliability of GradCAM and Kernel SHAP in this context via some basic sanity check experiments (iii) finally, we investigate a random selection of models from our suite using GradCAM and Kernel SHAP. We show that despite high accuracy, the models will occasionally assign importance to features that are not relevant to the diagnostic task. We also show that models of similar accuracy will produce different explanations as measured by these methods. This work represents first steps in bridging the gap between model accuracy and interpretability in the domain of skin cancer classification.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1908.06612/full.md

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/1908.06612/full.md

## References

23 references — full list in the complete paper: https://tomesphere.com/paper/1908.06612/full.md

---
Source: https://tomesphere.com/paper/1908.06612