Exploring Advances in Transformers and CNN for Skin Lesion Diagnosis on   Small Datasets

Leandro M. de Lima; Renato A. Krohling

arXiv:2205.15442·cs.CV·January 13, 2023

Exploring Advances in Transformers and CNN for Skin Lesion Diagnosis on Small Datasets

Leandro M. de Lima, Renato A. Krohling

PDF

TL;DR

This paper evaluates recent Transformer and CNN architectures, along with multimodal fusion techniques, for skin lesion diagnosis on small datasets, achieving state-of-the-art accuracy results.

Contribution

It introduces a comprehensive evaluation of Transformer-based models and fusion methods for skin lesion diagnosis, highlighting their effectiveness over traditional CNNs.

Findings

01

PiT, CoaT, and ViT models achieved top accuracy scores.

02

Multimodal feature fusion improved diagnosis performance.

03

Transformer-based architectures outperform CNNs on small datasets.

Abstract

Skin cancer is one of the most common types of cancer in the world. Different computer-aided diagnosis systems have been proposed to tackle skin lesion diagnosis, most of them based in deep convolutional neural networks. However, recent advances in computer vision achieved state-of-art results in many tasks, notably Transformer-based networks. We explore and evaluate advances in computer vision architectures, training methods and multimodal feature fusion for skin lesion diagnosis task. Experiments show that PiT ( $0.800 \pm 0.006$ ), CoaT ( $0.780 \pm 0.024$ ) and ViT ( $0.771 \pm 0.018$ ) backbone models with MetaBlock fusion achieved state-of-art results for balanced accuracy metric in PAD-UFES-20 dataset.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsCo-Scale Conv-attentional Image Transformer