# Transformer-Based Foundation Learning for Robust and Data-Efficient Skin Disease Imaging

**Authors:** Inzamam Mashood Nasir, Hend Alshaya, Sara Tehsin, Wided Bouchelligua

PMC · DOI: 10.3390/diagnostics16030440 · Diagnostics · 2026-02-01

## TL;DR

This paper introduces a transformer-based model for skin disease imaging that improves accuracy and adapts well to new datasets with limited labeled data.

## Contribution

A dermatology-specific foundation model using self-supervised learning and vision transformers for robust and data-efficient lesion classification.

## Key findings

- The model achieves 94.87%, 97.32%, and 98.17% classification accuracy on ISIC 2018, HAM10000, and PH2 datasets.
- It outperforms supervised models by 3.5–5.8% in cross-dataset transfer experiments.
- The model performs comparably with only 10% of labeled data, showing strong data efficiency.

## Abstract

Background/Objectives: Accurate and reliable automated dermoscopic lesion classification remains challenging. This is due to pronounced dataset bias, limited expert-annotated data, and poor cross-dataset generalization of conventional supervised deep learning models. In clinical dermatology, these limitations restrict the deployment of data-driven diagnostic systems across diverse acquisition settings and patient populations. Methods: Motivated by these challenges, this study proposes a transformer-based, dermatology-specific foundation model. The model learns transferable visual representations from large collections of unlabeled dermoscopic images via self-supervised pretraining. It integrates large-scale dermatology-oriented self-supervised learning with a hierarchical vision transformer backbone. This enables effective capture of both fine-grained lesion textures and global morphological patterns. The evaluation is conducted across three publicly available dermoscopic datasets: ISIC 2018, HAM10000, and PH2. The study assesses in-dataset, cross-dataset, limited-label, ablation, and computational-efficiency settings. Results: The proposed approach achieves in-dataset classification accuracies of 94.87%, 97.32%, and 98.17% on ISIC 2018, HAM10000, and PH2, respectively. It outperforms strong transformer and hybrid baselines. Cross-dataset transfer experiments show consistent performance gains of 3.5–5.8% over supervised counterparts. This indicates improved robustness to domain shift. Furthermore, when fine-tuned with only 10% of the labeled training data, the model achieves performance comparable to fully supervised baselines. Conclusions: This highlights strong data efficiency. These results demonstrate that dermatology-specific foundation learning offers a principled and practical solution for robust dermoscopic lesion classification under realistic clinical constraints.

## Full-text entities

- **Diseases:** Skin Disease (MESH:D012871), lesion (MESH:D009059)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12897459/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12897459/full.md

## References

36 references — full list in the complete paper: https://tomesphere.com/paper/PMC12897459/full.md

---
Source: https://tomesphere.com/paper/PMC12897459