# Fine-Tuning a Small Vision Language Model Using Synthetic Data for Explaining Bacterial Skin Disease Images

**Authors:** Shiwan Zhang, Abdurrahim Yilmaz, Gulsum Gencoglan, Burak Temelkuran

PMC · DOI: 10.3390/diagnostics16040603 · 2026-02-18

## TL;DR

This paper explores using a small vision language model fine-tuned with synthetic data to explain bacterial skin disease images, achieving strong diagnostic performance.

## Contribution

The study introduces a novel method of fine-tuning a compact VLM using synthetic QA supervision for dermatology image analysis.

## Key findings

- QA-only supervision results in the best report-generation performance.
- The combined QA+caption strategy achieves the highest classification accuracy of 70.20%.
- Synthetic data effectively enhances compact VLMs for medical image understanding.

## Abstract

Background/Objectives: Vision language models (VLMs) show strong potential for medical image understanding, but their large scale often limits practical deployment. This study investigates whether a compact VLM can be effectively adapted for dermatology, with a focus on explaining bacterial skin disease images. Methods: We curate a dataset derived from PMC-OA using the BIOMEDICA dataset and construct PMC-derma-VQA-bacteria by pairing images with inherited figure captions and synthetically generated question–answer (QA) supervision produced by Google’s Gemini model. SmolVLM is fine-tuned under three supervision settings: QA-only, caption-only, and a combined QA+caption strategy. The models are evaluated on a held-out test set for both text-generation quality and diagnostic classification performance. Results: QA-only supervision yields the best report-generation performance, while the combined QA+caption setting achieves the highest classification accuracy (70.20%). Conclusions: Synthetic QA supervision can meaningfully enhance compact VLMs for medical image understanding and diagnostic support in dermatology.

## Linked entities

- **Diseases:** bacterial skin disease (MONDO:0024295)

## Full-text entities

- **Diseases:** acne vulgaris (MESH:D000152), Bacterial Skin Disease (MESH:D017192), bacterial disease (MESH:D001424), dermatological disorders (MESH:D000168), hallucinations (MESH:D006212), OA (MESH:D010003), LLMs (MESH:D007806), folliculitis (MESH:D005499), pigmentation (MESH:D010859), lesion/disease (MESH:D004194), injury to (MESH:D014947), Verrucae)Dermatophyte (tinea) infectionsCandidiasisPsoriasisLichenAlopecia AreataVitiligoAtopic (MESH:D014860), VLMs (MESH:D014786), PMC (MESH:D020967), Skin ConditionsSweat (MESH:D012871)
- **Species:** Bacteria Latreille et al. 1825 (Bacteria stick insect, genus) [taxon 629395], Homo sapiens (human, species) [taxon 9606]

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12939511/full.md

---
Source: https://tomesphere.com/paper/PMC12939511