Fusion of Foundation and Vision Transformer Model Features for Dermatoscopic Image Classification

Amirreza Mahbod; Rupert Ecker; Ramona Woitek

arXiv:2505.16338·cs.CV·May 23, 2025

Fusion of Foundation and Vision Transformer Model Features for Dermatoscopic Image Classification

Amirreza Mahbod, Rupert Ecker, Ramona Woitek

PDF

TL;DR

This study compares dermatology-specific foundation and Vision Transformer models for skin lesion classification, showing that combining features from both improves diagnostic accuracy on standard datasets.

Contribution

It introduces a fusion approach combining foundation model features with Vision Transformer outputs for improved skin lesion classification.

Findings

01

PanDerm-based MLP performs comparably to fine-tuned Swin Transformer.

02

Fusion of PanDerm and Swin Transformer predictions enhances accuracy.

03

Using frozen features with non-linear probing is effective for classification.

Abstract

Accurate classification of skin lesions from dermatoscopic images is essential for diagnosis and treatment of skin cancer. In this study, we investigate the utility of a dermatology-specific foundation model, PanDerm, in comparison with two Vision Transformer (ViT) architectures (ViT base and Swin Transformer V2 base) for the task of skin lesion classification. Using frozen features extracted from PanDerm, we apply non-linear probing with three different classifiers, namely, multi-layer perceptron (MLP), XGBoost, and TabNet. For the ViT-based models, we perform full fine-tuning to optimize classification performance. Our experiments on the HAM10000 and MSKCC datasets demonstrate that the PanDerm-based MLP model performs comparably to the fine-tuned Swin transformer model, while fusion of PanDerm and Swin Transformer predictions leads to further performance improvements. Future work will…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAttention Is All You Need · Stochastic Depth · Linear Layer · Layer Normalization · Multi-Head Attention · Dense Connections · Gated Linear Unit · Softmax · Swin Transformer · Position-Wise Feed-Forward Layer