# Clinical Application of Vision Transformers for Melanoma Classification: A Multi-Dataset Evaluation Study

**Authors:** Antony Garcia, Jixing Zhou, Gabriela Pinero-Crespo, Thomas Beachkofsky, Xinming Huang

PMC · DOI: 10.3390/cancers17213447 · Cancers · 2025-10-28

## TL;DR

This study explores using Vision Transformers to improve melanoma detection from skin images, showing better performance than existing methods.

## Contribution

The study introduces Vision Transformers with GAN-augmented data for melanoma classification, achieving higher accuracy than commercial tools.

## Key findings

- The ViT-L/16 model achieved a ROC-AUC of 0.902 on the MN187 dataset, outperforming CNN baselines.
- Adding GAN-generated images improved ROC-AUC to 0.915, with a statistically significant advantage over MoleAnalyzer Pro.
- Vision Transformers demonstrated better global feature representation compared to traditional CNN models.

## Abstract

Melanoma is a dangerous skin cancer that can be treated successfully when detected early, but it often looks similar to benign moles, which makes diagnosis difficult. This research uses Vision Transformers to help identify melanoma from skin images. The model was trained with real medical images and additional synthetic ones produced by a Deep Learning algorithm to improve learning. Its performance was tested against several Deep Learning classification models and a commercial diagnostic tool. The Vision Transformer achieved higher accuracy in separating cancerous and non-cancerous lesions. This approach may help doctors make faster and more confident assessments when examining skin images, supporting better detection of melanoma in clinical settings.

Background: Melanoma is one of the most lethal skin cancers, with survival rates largely dependent on early detection, yet diagnosis remains difficult because of its visual similarity to benign nevi. Convolutional neural networks have achieved strong performance in dermoscopic analysis but often depend on fixed input sizes and local features, which can limit generalization. Vision Transformers, which capture global image relationships through self-attention, offer a promising alternative. Methods: A ViT-L/16 model was fine-tuned using the ISIC 2019 dataset containing more than 25,000 dermoscopic images. To expand the dataset and balance class representation, synthetic melanoma and nevus images were produced with StyleGAN2-ADA, retaining only high-confidence outputs. Model performance was evaluated on an external biopsy-confirmed dataset (MN187) and compared with CNN baselines (ResNet-152, DenseNet-201, EfficientNet-B7, ConvNeXt-XL, ViT-B/16) and the commercial MoleAnalyzer Pro system using ROC-AUC and DeLong’s test. Results: The ViT-L/16 model reached a baseline ROC-AUC of 0.902 on MN187, surpassing all CNN baselines and the MoleAnalyzer Pro system, though the difference was not statistically significant (p = 0.07). After adding 46,000 confidence-filtered GAN-generated images, the ROC-AUC increased to 0.915, giving a statistically significant improvement over the commercial MoleAnalyzer Pro system (p = 0.032). Conclusions: Vision Transformers show strong potential for melanoma classification, especially when combined with GAN-based augmentation, offering advantages in global feature representation and data expansion that support the development of dependable AI-driven clinical decision-support systems.

## Linked entities

- **Diseases:** melanoma (MONDO:0005105)

## Full-text entities

- **Diseases:** Melanoma (MESH:D008545), benign nevi (MESH:D009506), skin cancers (MESH:D012878)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12607522/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12607522/full.md

## References

74 references — full list in the complete paper: https://tomesphere.com/paper/PMC12607522/full.md

---
Source: https://tomesphere.com/paper/PMC12607522