Exploring Advances in Transformers and CNN for Skin Lesion Diagnosis on Small Datasets
Leandro M. de Lima, Renato A. Krohling

TL;DR
This paper evaluates recent Transformer and CNN architectures, along with multimodal fusion techniques, for skin lesion diagnosis on small datasets, achieving state-of-the-art accuracy results.
Contribution
It introduces a comprehensive evaluation of Transformer-based models and fusion methods for skin lesion diagnosis, highlighting their effectiveness over traditional CNNs.
Findings
PiT, CoaT, and ViT models achieved top accuracy scores.
Multimodal feature fusion improved diagnosis performance.
Transformer-based architectures outperform CNNs on small datasets.
Abstract
Skin cancer is one of the most common types of cancer in the world. Different computer-aided diagnosis systems have been proposed to tackle skin lesion diagnosis, most of them based in deep convolutional neural networks. However, recent advances in computer vision achieved state-of-art results in many tasks, notably Transformer-based networks. We explore and evaluate advances in computer vision architectures, training methods and multimodal feature fusion for skin lesion diagnosis task. Experiments show that PiT (), CoaT () and ViT () backbone models with MetaBlock fusion achieved state-of-art results for balanced accuracy metric in PAD-UFES-20 dataset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsCo-Scale Conv-attentional Image Transformer
