MedGemma vs GPT-4: Open-Source and Proprietary Zero-shot Medical Disease Classification from Images

Md. Sazzadul Islam Prottasha; Nabil Walid Rafi

arXiv:2512.23304·cs.CV·April 20, 2026

MedGemma vs GPT-4: Open-Source and Proprietary Zero-shot Medical Disease Classification from Images

Md. Sazzadul Islam Prottasha, Nabil Walid Rafi

PDF

TL;DR

This study compares open-source MedGemma and proprietary GPT-4 for medical image diagnosis, showing MedGemma's superior accuracy and sensitivity after fine-tuning, highlighting the importance of domain-specific adaptation.

Contribution

It demonstrates that fine-tuning open-source models like MedGemma enhances diagnostic accuracy and clinical sensitivity over GPT-4 in medical imaging tasks.

Findings

01

MedGemma achieved 80.37% accuracy versus 69.58% for GPT-4.

02

MedGemma showed higher sensitivity in cancer and pneumonia detection.

03

Fine-tuning reduces hallucinations, improving clinical reliability.

Abstract

Multimodal Large Language Models (LLMs) introduce an emerging paradigm for medical imaging by interpreting scans through the lens of extensive clinical knowledge, offering a transformative approach to disease classification. This study presents a critical comparison between two fundamentally different AI architectures: the specialized open-source agent MedGemma and the proprietary large multimodal model GPT-4 for diagnosing six different diseases. The MedGemma-4b-it model, fine-tuned using Low-Rank Adaptation (LoRA), demonstrated superior diagnostic capability by achieving a mean test accuracy of 80.37% compared to 69.58% for the untuned GPT-4. Furthermore, MedGemma exhibited notably higher sensitivity in high-stakes clinical tasks, such as cancer and pneumonia detection. Quantitative analysis via confusion matrices and classification reports provides comprehensive insights into model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.