A Hierarchical Benchmark of Foundation Models for Dermatology
Furkan Yuceyalcin, Abdurrahim Yilmaz, Burak Temelkuran

TL;DR
This paper evaluates the performance of various foundation models for hierarchical skin lesion classification, revealing a gap in fine-grained diagnostic capabilities and emphasizing the need for specialized models in dermatology.
Contribution
It introduces a hierarchical evaluation framework and benchmarks ten foundation models across multiple levels of dermatological classification.
Findings
MedImageInsights achieved 97.52% F1 on binary malignancy detection.
Performance drops to 65.50% F1 on 40-class subtype classification.
Dermatology-specific models excel at fine-grained classification but underperform on broad tasks.
Abstract
Foundation models have transformed medical image analysis by providing robust feature representations that reduce the need for large-scale task-specific training. However, current benchmarks in dermatology often reduce the complex diagnostic taxonomy to flat, binary classification tasks, such as distinguishing melanoma from benign nevi. This oversimplification obscures a model's ability to perform fine-grained differential diagnoses, which is critical for clinical workflow integration. This study evaluates the utility of embeddings derived from ten foundation models, spanning general computer vision, general medical imaging, and dermatology-specific domains, for hierarchical skin lesion classification. Using the DERM12345 dataset, which comprises 40 lesion subclasses, we calculated frozen embeddings and trained lightweight adapter models using a five-fold cross-validation. We introduce…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCutaneous Melanoma Detection and Management · AI in cancer detection · Face recognition and analysis
