DermaSynth: Rich Synthetic Image-Text Pairs Using Open Access Dermatology Datasets
Abdurrahim Yilmaz, Furkan Yuceyalcin, Ece Gokyayla, Donghee, Choi, Ozan Erdem, Ali Anil Demircali, Rahmetullah Varol, Ufuk, Gorkem Kirabali, Gulsum Gencoglan, Joram M. Posma, Burak Temelkuran

TL;DR
DermaSynth is a large, synthetic dermatology image-text dataset created using advanced language models and open access images, aimed at advancing AI research in dermatology.
Contribution
We introduce DermaSynth, a novel synthetic dataset of over 92,000 image-text pairs for dermatology, generated using state-of-the-art LLMs and metadata integration.
Findings
Dataset contains 92,020 synthetic image-text pairs.
Fine-tuned a dermatology-specific vision-language model, DermatoLlama 1.0.
Dataset supports AI research and development in dermatology.
Abstract
A major barrier to developing vision large language models (LLMs) in dermatology is the lack of large image--text pairs dataset. We introduce DermaSynth, a dataset comprising of 92,020 synthetic image--text pairs curated from 45,205 images (13,568 clinical and 35,561 dermatoscopic) for dermatology-related clinical tasks. Leveraging state-of-the-art LLMs, using Gemini 2.0, we used clinically related prompts and self-instruct method to generate diverse and rich synthetic texts. Metadata of the datasets were incorporated into the input prompts by targeting to reduce potential hallucinations. The resulting dataset builds upon open access dermatological image repositories (DERM12345, BCN20000, PAD-UFES-20, SCIN, and HIBA) that have permissive CC-BY-4.0 licenses. We also fine-tuned a preliminary Llama-3.2-11B-Vision-Instruct model, DermatoLlama 1.0, on 5,000 samples. We anticipate this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI in cancer detection · Mycobacterium research and diagnosis · Digital Imaging in Medicine
