Fine-Tuning a Small Vision Language Model Using Synthetic Data for Explaining Bacterial Skin Disease Images
Shiwan Zhang, Abdurrahim Yilmaz, Gulsum Gencoglan, Burak Temelkuran

TL;DR
This paper explores using a small vision language model fine-tuned with synthetic data to explain bacterial skin disease images, achieving strong diagnostic performance.
Contribution
The study introduces a novel method of fine-tuning a compact VLM using synthetic QA supervision for dermatology image analysis.
Findings
QA-only supervision results in the best report-generation performance.
The combined QA+caption strategy achieves the highest classification accuracy of 70.20%.
Synthetic data effectively enhances compact VLMs for medical image understanding.
Abstract
Background/Objectives: Vision language models (VLMs) show strong potential for medical image understanding, but their large scale often limits practical deployment. This study investigates whether a compact VLM can be effectively adapted for dermatology, with a focus on explaining bacterial skin disease images. Methods: We curate a dataset derived from PMC-OA using the BIOMEDICA dataset and construct PMC-derma-VQA-bacteria by pairing images with inherited figure captions and synthetically generated question–answer (QA) supervision produced by Google’s Gemini model. SmolVLM is fine-tuned under three supervision settings: QA-only, caption-only, and a combined QA+caption strategy. The models are evaluated on a held-out test set for both text-generation quality and diagnostic classification performance. Results: QA-only supervision yields the best report-generation performance, while the…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Cell Image Analysis Techniques
