# Foundation-Model-Driven Skin Lesion Segmentation and Classification Using SAM-Adapters and Vision Transformers

**Authors:** Faisal Binzagr, Majed Hariri

PMC · DOI: 10.3390/diagnostics16030468 · Diagnostics · 2026-02-03

## TL;DR

This paper introduces a new framework combining SAM-Adapters and Vision Transformers to improve skin lesion segmentation and classification for skin cancer diagnosis.

## Contribution

The novel framework integrates SAM-Adapters for lesion segmentation and ViT-based classification with lesion-specific cropping and cross-attention fusion.

## Key findings

- The proposed method achieves a Dice score of 94.27% on ISIC 2018 for segmentation.
- Classification accuracy reaches 95.88% on ISIC 2018 and 96.37% on HAM10000.
- Ablation studies confirm the importance of SAM-Adapters and lesion-specific fusion for performance.

## Abstract

Background: The precise segmentation and classification of dermoscopic images remain prominent obstacles in automated skin cancer evaluation due, in part, to variability in lesions, low-contrast borders, and additional artifacts in the background. There have been recent developments in foundation models, with a particular emphasis on the Segment Anything Model (SAM)—these models exhibit strong generalization potential but require domain-specific adaptation to function effectively in medical imaging. The advent of new architectures, particularly Vision Transformers (ViTs), expands the means of implementing robust lesion identification; however, their strengths are limited without spatial priors. Methods: The proposed study lays out an integrated foundation-model-based framework that utilizes SAM-Adapter-fine-tuning for lesion segmentation and a ViT-based classifier that incorporates lesion-specific cropping derived from segmentation and cross-attention fusion. The SAM encoder is kept frozen while lightweight adapters are fine-tuned only, to introduce skin surface-specific capacity. Segmentation priors are incorporated during the classification stage through fusion with patch-embeddings from the images, creating lesion-centric reasoning. The entire pipeline is trained using a joint multi-task approach using data from the ISIC 2018, HAM10000, and PH2 datasets. Results: From extensive experimentation, the proposed method outperforms the state-of-the-art segmentation and classification across the dataset. On the ISIC 2018 dataset, it achieves a Dice score of 94.27% for segmentation and an accuracy of 95.88% for classification performance. On PH2, a Dice score of 95.62% is achieved, and for HAM10000, an accuracy of 96.37% is achieved. Several ablation analyses confirm that both the SAM-Adapters and lesion-specific cropping and cross-attention fusion contribute substantially to performance. Paired t-tests are used to confirm statistical significance for all the previously stated measures where improvements over strong baselines indicate a p<0.01 for most comparisons and with large effect sizes. Conclusions: The results indicate that the combination of prior segmentation from foundation models, plus transformer-based classification, consistently and reliably improves the quality of lesion boundaries and diagnosis accuracy. Thus, the proposed SAM-ViT framework demonstrates a robust, generalizable, and lesion-centric automated dermoscopic analysis, and represents a promising initial step towards clinically deployable skin cancer decision-support system. Next steps will include model compression, improved pseudo-mask refinement and evaluation on real-world multi-center clinical cohorts.

## Linked entities

- **Diseases:** skin cancer (MONDO:0002898)

## Full-text entities

- **Diseases:** skin cancer (MESH:D012878), Skin Lesion (MESH:D012871)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12896749/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12896749/full.md

## References

47 references — full list in the complete paper: https://tomesphere.com/paper/PMC12896749/full.md

---
Source: https://tomesphere.com/paper/PMC12896749